1887
PDF

Abstract

Parallel HPC applications benefit from multi core CPU technology and have been able to<br>multiply the computation density by a factor of 2 to 4 and later by 8. This improvement is not<br>enough compared to the computation requirements of today’s applications. This is why<br>people have been looking for new hardware and specialized processors which could give<br>applications gains from 20 up to 100.<br>Specialized processors like GPUs have improved performance at a greater pace than Moore<br>law predicts. They started 10 years ago with a technology using 350nm, 5 million transistors<br>at 75Mhz and now are using 55nm, 700Millions transistors at 800Mhz being able to deliver<br>512GFlops or more than 3.5GFlops/Watt.<br>This leads to improvements factors of 1.7x/year in transistors count, 1.3x/year in clock speed,<br>2.0x/year in processing units and 1.3x/year in memory bandwidth. Using such powerful<br>dedicated processors as well as CPU in a highly parallel environment of Multi-core for both is<br>showing the requirement to be able to use in the most efficient way this heterogeneous<br>environment of Hybrid computing.<br>This hardware environment exists and can be used today. The first challenge is on the<br>software development side. Development tools need to integrate heterogeneous programming<br>as well as multi core from the core of their language being able to support code generation on<br>different processors types as well as handling asynchronous behaviors. This comes with<br>compilers and libraries supporting this and being design or extended for it. Obviously those<br>tools need to support multiple hardware platforms to lead to some standards.<br>The second key challenge change is the evolution of buses and bandwidth linking together the<br>different cores of CPU and GPUs. And the way they talk to each other. Fusion projects will<br>address those evolutions in the future by defining new architectures around those processors<br>to improve the data flow between them which will be the key to use all the power available.<br>Cross bar memory controllers will allow GPUs to talk each other very quickly without<br>breaking parallelism. Hyper Transport bus will improve communication between GPUs and<br>CPUs. Finally Multi core GPUs and CPUS on the same die will increase even more the<br>compute density.<br>Different benchmarks and application codes have been used to demonstrate already the<br>benefits so such architecture. We will present SGEMM results as well as different algorithms.<br>The results will highlight the fact that performance is affected by in/out copy of the data on<br>the GPU at the moment and that finer tunings allows huge jump in performance. We will also<br>show that changing the way algorithms have been implemented for CPU to fit GPU<br>architecture adds even more performance gains.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.201405040
2008-06-09
2024-04-24
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.201405040
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error