1887

Abstract

In this work we propose a parallel implementation of RTM based on cooperative work between CPUs and coprocessors that proved to be competitive to other accelerated solutions available. This implementation is able to run whatever the number of coprocessors is (from 0 to the maximum available with respect to the computer vendor specifications), and is very scalable in a cluster environment. Based on standard programming model it will also be portable without modification to any future configurations of Xeon and Xeon Phi, or X-CPU + Y-CPU that supports MPI+OpenMP+C language. Here describe our unified programing model for optimized code. We also discuss load balancing of the heterogeneous cluster configuration; validate the performance; and scalability of the current implementation. In the current configuration with 4 Xeon Phi cards with 16GB GDDR5 (64 GB total), we can migrate full shot gathers on a single node. This proposed node configuration also frees memory in the 2-socket host for RTM formulations that might require saving snapshots for cross-correlation and any other auxiliary arrays between iterations of the algorithm.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.20141921
2014-09-07
2024-04-18
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.20141921
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error