Full text loading...
This work explores optimizing resource utilization in high-performance computing (HPC) environments using Saudi Aramco’s GHAWAR-1 supercomputer. Traditional applications often underutilize HPC resources due to their design and execution models. To address this, the study profiles a compute-intensive numerical reservoir simulation application to understand performance behaviors and improve utilization efficiency. GHAWAR-1, a top-tier on-premise supercomputer equipped with AMD EPYC 7702 processors and Slingshot interconnects, served as the testbed.
Benchmarking using HPL revealed significant compute capacity and memory bandwidth, informing the setup for further application profiling. Two reservoir simulation models were analyzed. Model-1 showed high L3 cache hit rates and modest memory channel usage, while Model-2 fully utilized memory bandwidth and core frequencies, revealing different optimization needs. Further profiling indicated two distinct categories of application behaviors: compute-bound models benefiting from increased ranks, and memory-bound models limited by bandwidth rather than core count.
Experiments adjusting core counts and node configurations revealed a potential 33% gain in resource utilization, although diminishing returns and runtime penalties occurred in some scenarios. Ultimately, the study emphasizes the need for tailored resource allocation strategies, guided by detailed profiling, to achieve optimal performance in HPC-driven reservoir simulations.