1887
PDF

Abstract

Microprocessors have been hitting the limits of attainable clock frequencies for the past few years, resulting in the current multi-core processor solutions provided by the major microprocessor vendors. Multiple cores on a chip result in the need to share the same pins to get to the memory system and communication channels to other machines. This leads to a “memory wall”, since the number of pins per chip does not scale with the number of cores, and a “power wall” since chips must still be cooled within the same physical space. Many geophysically important applications such as finite difference modeling, downwards continuation based migration and sparse matrix solvers already exhibit significantly worse than linear scaling on multiple cores, a problem that is only going to worsen as the major microprocessor vendors move beyond quad-core chips to many-core architectures. Maxeler streaming accelerators implemented on Field Programmable Gate Arrays (FPGAs) allow us to bypass the memory wall by minimizing access to external memory and explicitly forwarding data on-chip at a very high bandwidth (over 10TB/s on the latest chips). The high performance attainable with such architectures has been established for a range of applications (for example [1], [2], [3]). At the same time, since FPGA performance is achieved by massive parallelism at relatively low clock frequencies (hundreds of MHz), we avoid the “power wall” and allow our FPGA-based HPC systems to be configured very densely, with accompanying savings in operational costs for power, space, maintenance, etc.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.20149993
2010-06-13
2024-04-29
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.20149993
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error