In this paper, we design an Intel Xeon Phi based cluster specifically tuned for RTM and compare its performance to our current optimized architecture consisting of Nvidia GPU accelerated nodes. The Xeon Phi nodes are designed to offer a good balance between co-processor computing capabilities, host computing capabilities, local scratch bandwidth and network bandwidth. Performance is evaluated at the system level including wave propagation kernels but also wavefield checkpointing to the local scratch and pre and post-processing steps. Moreover, the kernels are tuned for the Xeon Phi and compared to our GPU optimized kernels taking into account the tuning effort. Overall, the obtained performance is comparable to our GPU-based architecture while offering a better portability since most of the code is identical to the CPU implementation.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error