This work presents a case study on leveraging NVIDIA Tensor Cores (Hopper H100) to accelerate 3D stencil computations originally written in FP32 by exploiting the TF32 format.
Mishra, A., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., … & Micikevicius, P. (2021). Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378.