1887

Abstract

Summary

This work presents a case study on leveraging NVIDIA Tensor Cores (Hopper H100) to accelerate 3D stencil computations originally written in FP32 by exploiting the TF32 format.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.2025643012
2025-10-06
2026-02-11
Loading full text...

Full text loading...

References

  1. Brossier, R., Etienne, V., Operto, S., & Virieux, J. (2010). Frequency-domain numerical modelling of visco-acoustic waves with finite-difference and finite-element discontinuous Galerkin methods. Acoustic waves, 434
    [Google Scholar]
  2. NVIDIA. 2023. NVIDIA H100 Tensor Core GPU Architecture. https://resources.nvidia.com/en-us-tensor-core.
    [Google Scholar]
  3. Cui, C. (2024). Acceleration of tensor-product operations with tensor cores. ACM Transactions on Parallel Computing, 11(4), 1–24.
    [Google Scholar]
  4. Mishra, A., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., … & Micikevicius, P. (2021). Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378.
    [Google Scholar]
/content/papers/10.3997/2214-4609.2025643012
Loading
/content/papers/10.3997/2214-4609.2025643012
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error