Eighth EAGE High Performance Computing Workshop
- Conference date: September 16-18, 2024
- Location: KAUST, Saudi Arabia
- Published: 16 September 2024
-
-
Optimizing SRME: Exploiting the Power of GPUs and High-Core Count CPUsBEI
More LessAuthors L. CasasantaSummaryIn this work, we review the optimization compromises made for running Surface-Related Multiple Elimination (SRME) on GPUs and high core count CPUs. The most important optimization is reducing disk reads by caching input data in the fastest levels of memory. Other optimizations include running asynchronous tasks, buffering the pipeline of data movement, and recycling redundant calculations when outputting results in common-shot order. However, care must be taken not to introduce an excessive memory footprint for large datasets. Therefore, we prioritize total memory for reducing the number of disk reads by serializing some steps in the data movement pipeline and splitting large shots with an offset-based binning scheme.
-
-
-
Full Wave Imaging beyond GPU Memory Limits
More LessAuthors N. Bienati, L. Bortot and J. PanizzardiSummaryFull-Wave Imaging is evolving in the direction of higher frequencies/resolution and more accurate wave propagation physics. Both factors imply a significant increase on the pressure on efficient and effective memory management, due to the conflict between the large increase of the required memory and the need to guarantee data locality to maximize performance. The memory hierarchy must be explicitly managed to reach the optimal tradeoff. At the same time, unnecessary code complexity must be hidden from the geophysical programmer to maximize the effectiveness in developing better imaging algorithms.
-
-
-
Optimizing GAN Training for 3D Seismic Microstructure Generation
More LessAuthors Y. Ghazal, M. Awadalla, D. Barradas, A. Ayyad, A. Nasr and S. GhaniSummaryIn computational geophysics, generating precise 3D microstructures from seismic data is crucial for detailed subsurface analysis. Traditional methods often fall short in achieving the necessary resolution, but Generative Adversarial Networks (GANs), specifically SliceGAN, have proven effective.
These networks allow for the generation of large volumes of statistically representative microstructures, improving the simulation of material properties based on their microstructural traits. However, the efficiency of GAN training is critical for both feasibility and accuracy. This study introduces an optimized GAN training method using a distributed data parallel (DDP) strategy within the PyTorch Lightning framework to leverage modern GPU computational power. Significant adaptations to the original SliceGAN code were made to incorporate PyTorch Lightning, allowing for DDP across multiple GPUs, significantly reducing training times and enhancing scalability. The method was tested on NVIDIA V100 and A100 GPUs, demonstrating near-linear scalability and a potential speedup of 48 times with eight A100 GPUs. This optimized training process notably improves the generation of complex, high-fidelity 3D microstructures essential for geophysical analysis, highlighting the advantages of PyTorch Lightning in scenarios requiring high scalability and rapid execution, thereby offering substantial benefits for geophysical research and exploration.
-
-
-
Comparative Analysis of Super Resolution Techniques in Micro-CT Imaging
More LessAuthors M. Awadalla, Y. Ghazal, D. Barradas, A. Ayyad, A. Nasr and S. GhaniSummaryImage super-resolution is crucial in computer vision, particularly for enhancing micro-CT images which are essential for detailed scientific analysis. This technique involves reconstructing high-resolution images from their lower-resolution versions, focusing on improving image details and sharpness. Our study tested three deep learning models—PRIDNet, MW-CNN, and VDSR—on a dataset of 1400 uniformly sized images, downsampled by a factor of three for the experiment. We evaluated model performance using various metrics including SSIM, MS-SSIM, PSNR, and UIQ, which assess similarity, quality across scales, error, and image distortion respectively. A targeted experimental strategy was employed to optimize performance by exploring different combinations of loss functions. The best results were achieved by adapting loss functions specific to each model’s needs, with combinations like L1+MS-SSIM proving effective in enhancing perceptual quality by preserving structural information. This emphasizes the critical role of carefully selecting both the model and its corresponding loss function for superior super-resolution outcomes in specialized imaging applications like microscopy.
-
-
-
Accelerating 2-D Full Wavefield Forward Modeling via Frequency Interpolation with a Tiny Attention U-Net Based Model
More LessAuthors J. Zhao, N. Akram, N. Savva and E. VerschuurSummaryJoint migration inversion, an innovative geophysical method, smartly merges velocity model building with seismic imaging using full wavefield migration algorithms. It uses the full seismic reflection response - including multiple scattering - in a unified framework to produce high resolution geological structure images. Despite precision of FWM, it confronts heavy computational demands, memory needs, and reliance on strong computing resources. This paper proposes a deep learning-based seismic interpolation acceleration strategy centered on a tiny Attention U-Net based model. Designed for efficient frequency domain sparse wavefield interpolation and reconstruction, it reduces full wavefield forward modeling costs. Trained on a 2-D lens-shaped velocity model, the model adaptively learns complex mappings between sparse and complete wavefields, reproducing 2-D numerical simulations while cutting computation time. With up to 50 % missing seismic data, this approach improves efficiency by about 40 % vs. conventional FWM. Integrating trained model into JMI further saves about 30 % compute time under simplified conditions, affirming its advantages and potential in frequency domain forward modeling.
-
-
-
Leveraging the High Bandwidth of Last-Level Cache for the First-Order Reverse Time Migration
More LessAuthors P. PlotnitskiiSummaryIn this work we present performance results of multicore wavefront diamond tiling blocking (MWD-TB) RTM and its superiority to the traditional spatial blocking-based RTM. MWD-TB RTM provides 2X speedup, while the use of MWD-TB technique in forward modeling provides 4X speedup.
For future work, we plan to deploy our MWD approach for the first-order acoustic wave equation into the full RTM and FWI pipelines using the latest Intel / AMD x86 architectures and GPU hardware accelerators. Please refer to the recent publication on this research [ 4 ] for the wavefield modeling problem results on different architectures
-
-
-
GPU-accelerated Full-Waveform Inversion using Hamiltonian Monte Carlo Method
More LessAuthors D. Urozayev, B. Boddupalli and P. EliassonSummaryThis study utilizes GPU computing to perform seismic full waveform inversion, a high-dimensional and ill-posed problem. By employing hybrid Hamiltonian Monte Carlo methods, it enables efficient computation of the posterior distribution under various priors, which regularize the problem. The approach facilitates assessing the efficiency of different priors and regularization techniques. As high-performance computing continues to advance, these methods allow for the development of more sophisticated inversion algorithms for large-scale seismic problems, improving uncertainty estimation and aiding decision-making.
-
-
-
Full Injection of Devito Generated Code into Shell’s Wave Equation Library
More LessAuthors J. Van der Holst, D. Datta and A. St-CyrSummaryDevito allows for the generation of optimized GPU kernels. Recently, we managed to code inject Devito generated GPU kernel code in the Shell Wave Equation Library (SWELL) which is used in RTM and FWI. This is done via an interface code, which consists of an interface kernel for the conversion of native SWELL data structures to Devito data structures. The generated Devito kernels are, up to some minor scripting, used as is.
Benchmarking native SWELL kernels against Devito generated kernels gives us a measure of how the native SWELL kernels are performing. Not all complicated propagators can be solved by native SWELL kernels as some of those do not exist. For those propagators it is easier to usegenerated Devito kernels. We started with a 2nd order sponge equation for which we developed the SWELL-Devito interface code and Devito notebooks to generate GPU kernels. The resulting Devito and native SWELL kernels for acoustic constant and VTI constant density propagators are similar in computational speed. Next, the more complicated elastic triclinic propagator (ELTRIVD) was studied. The resulting unoptimized Devito solver for ELTRIVD is around 40/80% slower than the optimized native SWELL counterpart.
-
-
-
Exploiting Tensor Cores for Stencil-based PDE Solvers
More LessAuthors V. Le Fevre and H. LtaiefSummaryIn this paper, we investigate the use of Tensor Cores in recent GPUs for stencil computations. Used in many scientific and industrial applications, stencil computations have been largely optimized on CPUs but some efforts remain to be done for efficient execution on GPUs. We derive a formulation of the algorithm based on common linear algebra kernels (namely GEMM) to provide a portable solution between GPU generations that allows to make use of the faster Tensor Cores. We provide some premilinary results for the Box-3D27P stencil, exploiting the high low-precision throughput of this hardware.
-
-
-
Performance Tuning of Seismic Processing Software with Integrated Profiling Toolsbei
More LessAuthors N. Wilson, M. Nauta and L. CasasantaSummaryThis paper outlines the efforts made to integrate HPC profiling tools directly into the user interface of a seismic-processing software package. This endeavor represents a pioneering initiative within the seismic industry, constituting a significant novelty in the processing software landscape. As there is no universal profiling solution capable of offering optimal detail for all scenarios, we meticulously evaluated various profiling tools based on their functionality and usability.
-
-
-
Efficient Multidimensional Deconvolution with an H2-Like Parametrization
More LessAuthors D. SushnikovaSummaryThis study presents a new approach to improving the efficiency of Multidimensional Deconvolution (MDD) for seismic wavefield redatuming. While MDD offers more accurate results than traditional methods, it is often limited by high computational demands due to the large and complex matrices involved in the process. We introduce an innovative technique that uses low-rank and H2-like parametrization to compress these matrices, reducing both memory usage and computational costs.
Our method focuses on representing the operator, right-hand side, and unknowns in a low-rank format, allowing for the solution of smaller linear systems in the frequency domain. This approach is tested on 2D and 3D synthetic seismic datasets, demonstrating significant reductions in computational complexity with only a slight decrease in solution quality.
The potential impact of this method is substantial—it could make MDD a more viable tool for large-scale geophysical applications, offering significantly improved efficiency. By using H2-like matrix compression, we enable faster and more resource-effective seismic wavefield reconstructions.
-
-
-
Frequency-Dependent Adaptive Reciprocal Low-Rank Factorization for Multidimensional Deconvolution
More LessSummaryWe present an enhanced strategy for multidimensional deconvolution (MDD) that effectively addresses its inherent ill-posed nature. This approach leverages low-rank regularization, which assumes that the unknown Green’s function can be represented with a low-rank structure. By employing lower-rank approximations for lower frequencies and increasing the rank for higher frequencies, our frequency-dependent rank selection achieves a flexible balance between accuracy and memory usage. This adaptive reciprocal low-rank factorization not only improves the accuracy of MDD but also promises significant computational efficiency gains. We demonstrate the effectiveness of our method using a synthetic ocean-bottom cable dataset, paving the way for future applications to large-scale MDD problems.
-
-
-
Energy Tuning: Methodology and Exploration
More LessAuthors F. Pautre, A. Hincelin and N. MollerSummaryDue to the rise in component TDP (Thermal Design Power) and energy costs, Viridien is actively exploring methods to reduce energy consumption or enhance data center throughput while maintaining a consistent energy budget. This presentation builds on two previous ones delivered at the EAGE HPC workshop: the first on GPU power capping in 2019, and the second on energy-efficient calculations in 2022, held in Milan.
-
-
-
GPU Acceleration of Graph Algorithms in NextVision: A Seismic Data Interpretation Tool
More LessAuthors N. KeskesSummaryWe focus on accelerating graph processing for seismic data interpretation using GPUs, particularly through the optimization of the Breadth First Search (BFS) algorithm. Seismic interpretation tools like NextVision require processing large graphs, which is traditionally compute-intensive. Initially parallelized on multicore CPUs, the code was slow for large datasets. To improve performance, we implemented GPU acceleration using NVIDIA’s cuGraph library.
The approach involved optimizing the BFS algorithm by launching multiple concurrent searches from independent vertices, maximizing parallelism and reducing overhead. Graph data was efficiently managed using RAPIDS Memory Manager (RMM), and the cuGraph API enabled efficient graph creation and BFS execution without redundant data replication. CUDA multi-streaming was also optimized to improve GPU utilization. Additionally, THRUST API was used to handle dynamic graph updates efficiently.
Experimental results showed significant performance improvements, particularly on larger graphs. The performance platform with an A100 GPU achieved speedups of up to 4.95x for a 44GB graph. Future work will explore additional algorithms, like FastAPSP, to further enhance performance. This study demonstrates the potential of GPU acceleration to significantly speed up graph algorithms in seismic data interpretation, benefiting geophysical analysis.
-