Development of Cell Processor Software for VLBI tools
Software correlation
Metsähovi has pioneered in the development of software correlators for VLBI. Unlike other institutes that have been working on the same problem we decided to utilize the new Cell processor that is used in blade servers and the Sony PlayStation 3. The Cell processor shows great promise for heavy calculation and for example the Los Alamos National Laboratories use it in their new "Roadrunner" supercomputer.
Cell processor basics
The Cell Broadband Engine architecture differs from the usual microprocessor architectures. In normal architectures the processor is designed for main CPU speed, in the Cell processor the heavy calculation is delegated to eight vector processors (SPUs, synergistic processor units) and the main processor is a scaled-down version of the IBM Powerpc processor.
The vector processors make the Cell processor very powerful in some calculation-oriented tasks, software correlation is one of them. The drawback is that the processor is difficult to program, each SPU is heavily pipelined and must be programmed in assembler-like commands if full speed is required. Even normal vectorized program can get only 5-10% of the processor speed.
The DiFX correlator
Our first approach, starting in January 2007, was to port the Australian DiFX correlator program to the cell processor. The DiFX program has been developed in the Swinburne University by Adam Deller and is used in production correlation in Australia.
The DiFX program is designed for a cluster of Intel-architecture computers and relies heavily on Intel performance primitives.A generic version of the DiFX was successfully produced in spring 2007 and ported to both Intel and Cell architecture processors. However the performance of the generic version was not as good as expected, even when the code was vectorized. The Cell processor was still two times slower than a dual-core Intel processor.
The cause for this proved to be inter-process and inter-processor communication. The DiFX program had been designed for relatively slow Intel architecture processors and it divided the task into too many small parts.
Metsähovi correlation engine
Our second approach was to modify Daniel Hackenberg's matmul (matrix multiplication) program to be a correlator engine. The calculation-heavy parts of the correlator were removed from DiFX and inserted into matmul and the loops were heavily unrolled.
An immediate ten-times speedup was observed, some fine-tuning (separating the real and imaginary parts of complex numbers into separate tables, new and improved sin and cos functions) speeded up the correlation still more.
Results
Preliminary results show that the existing hardware correlators can be replaced with a small number of Sony PlayStation 3s.
Replacing the correlators might even be more cost-effective than maintaining and upgrading them, since even the newest correlators were designed fifteen years ago.
| This work has received financial support under the EU FP6 Integrated Infrastructure Initiative contract number #026642, EXPReS. | ||