A conventional processor dedicates a relatively large fraction of its transistors to complex control logic, to maximise performance of a serial code. The Cell processor contains 8 fast Synergetic Processing Elements (SPEs) designed to maximise arithmetic throughput. Graphical processing units (GPUs) have a very large number of slower cores maximizing parallel throughput.
All this computational power comes at the cost of a programming paradigm change. An existing application would run on the Cell processor using only the PPE core without any performance benefit. Therefore, in order to obtain the maximum performance, it is necessary to use all SPEs and to adapt the code to match the underlying hardware architecture. This means addressing issues of vectorization, memory alignment and communication between main memory and local stores. On GPUs, it is now available a nice programming environment (CUDA) which helps dramatically in fully exploiting the potentiality of these devices.
HOW DO WE ASSIGN CREDITS?
On a standard PC, BOINC assigns credits based on the average between the floating-point and integer performance of the machine according to a set of benchmarks performed by the client, regardless of the real performance of the application on the machine.
Credits = 0.5(million float ops/sec + million int ops/sec)/864,000 * (cpu time in seconds),
(each unit of BOINC credit, the Cobblestone, is 864,000 MIPS)
where "float ops" are floating-point operations, and "int ops" are integer operations. These benchmarks on the Cell processor are of course wrong because they do not use the SPEs. The same applies for GPUs which are not considered by the benchmarks. In any case, as we said, these benchmarks are just an indication of the speed of the machine not of the speed of the application.
For instance, this machine returns the following benchmark by the BOINC client:
GenuineIntel Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz [Family 6 Model 15 Stepping 11]
Number of CPUs 2
Measured floating point speed 2281.82 million ops/sec
Measured integer speed 6348.82 million ops/sec
The average is therefore 4343 MIPS (million instruction per second) or equivalently 18.09 Cobblestone/hour as assigned by the BOINC system automatically. So, BOINC will assign to this machine 18.09 Cobblestones each CPU hour of calculation. Note that the ratio between floating operations and integer operations is approximately 3 (=6348/2281).
The way we assign credits takes into account these facts.
First of all we need to measure the floating point performance of the application. We have build a performance model of our applications (CELLMD and ACEMD) by counting the number of flops manually per step. For a specific WU, we are able to compute how many total floating operations are performed in average depending on the number of atoms, number of steps and so on. For CELLMD it was also possible to verify that the estimated flops were correct within few percent from the real value (multiplication, addition, subtraction, division and reciprocal square root are counted as as a single floating-point operation). In the case of GPU, we can also use interpolating texture units instead of computing some expensive expression. In this case, as the CPU does not have anything similar, we use the number of floats of the equivalent expression. It is not easy to measure the number of integer operations, so we guess the estimated MIPS to be 2 times the number of floating-point operations (really, we reckon that it would be correct to assign up to a factor 3 times, as in the example above). Therefore,
Credits = 0.5(MFLOP per WU + approx MIPS per WU)/864,000
(MFLOP is million of floating point operations)
Finally note, that this method produces the credits for the real performance of the application, not a benchmark as the BOINC client does, so it is a bit penalized.
In molecular dynamics, speed is critical and we put all our efforts into providing the most efficient molecular dynamics codes. To give you an idea, the development of these codes took literally years of work. Read more on the performance and efficiency of our applications: