1887

Abstract

Summary

As the industry shifts to more computationally intensive data-driven applications, so does the need for more scalable and efficient processing power. Running such applications on the cloud is the obvious solution as the resources can scale per the requirements and stage of the project. We propose an Infrastructure as Code (IaC) environment: S-Cube Cloud (SCC) to launch and control large volumes of computational resources needed for new seismic processing applications. To effectively leverage the cloud, spot instances must be utilised, which are offered at a large discount but may be interrupted at any time. A key limitation we address is the absence of an efficient and fault-tolerant parallelisation scheme which is cloud-native as, without it, usage of discounted spot instances is unachievable. We propose RIPS(SCI) - Robust Inter Process Simple Socket Communication Interface - which allows for the utilisation of spot instances through its fault tolerance. Applied in real-world conditions, RIPS communicates between thousands of instances and handles spot instance interruptions. Furthermore, RIPS relieves major bottlenecks in the master process bypassing processing terabytes of data per iteration compared to MPI. Savings of 70%–80% are observed in processing workloads in customer workflows using spot instances enabled by RIPS.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202332071
2023-03-20
2024-03-28
Loading full text...

Full text loading...

References

  1. The Open MPI Forum [2022] FAQ: Fault tolerance for parallel MPI jobs, The Open MPI Project
    [Google Scholar]
  2. Guasch, L., M.Warner, and C.Ravaut, [2019], Adaptive waveform inversion: Practice. Geophysics, 84, R447–R461.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.202332071
Loading
/content/papers/10.3997/2214-4609.202332071
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error