Full text loading...
It has been previously shown that the multiprocessing workflow is very beneficial for GPU-based reservoir simulation, in particular for smaller models. In this analysis, we evaluate the efficiency of multiprocessing across GPU generations (both consumer and professional grade) and vendors by measuring the maximum achievable total simulation throughput on both synthetic and real models of various sizes and physics. We demonstrate that for both vendors, the total throughput increases as additional processes are simultaneously used, reaching a peak at a certain number of such processes, depending on the model size and GPU generation. While with later generations of NVidia GPUs, such number tends to increase, showing that more and more processes are needed to saturate the increased memory bandwidth, for AMD GPUs this number is stable and consistent.
We also show that in most cases, when using only a single process while moving from an older to a more recent GPU architecture, it is difficult to achieve runtime scaling equivalent to the scaling of total memory bandwidth. At the same time, when comparing total simulation throughput using multiprocessing, performance is much closer to the expected level.