Full text loading...
Full waveform inversion is a computationally intensive task typically performed on high-performance computing systems. Traditional aggregated workflows are rigid, susceptible to failures, and lead to underutilization of HPC infrastructure. Moreover, they are not able to exploit heterogeneous hardware simultaneously, such as systems with CPUs and GPUs.
SEIS_ORC addresses these challenges by implementing a disaggregated approach to FWI, breaking the workflow into smaller, more manageable jobs. Acting as a layer above the FWI code, the orchestra- tor pilots job submissions, handles partial gradient computations, and ensures fault tolerance through automatic resubmissions. Written in Python, the tool is able to run on frontend or backend nodes of supercomputer environments.
This approach improves fault tolerance, optimizes resource utilization, and reduces wait times. However, it increases disk input/output due to the storage and retrieval of partial gradients. Performance tests on multiple supercomputers demonstrate that granularity settings affect both overhead and computational resource usage. Small granularities can potentially reduce wait times and computational resource consumption.