
Full text loading...
Traditional workflows using machine learning interpretation of seismic data rely on iterative training and inference on single datasets, producing models that fail to generalise beyond their training domain. Self-supervised training and scaling of 3D vision transformer (ViT) architectures enables seismic interpretation with improved generalisation across diverse datasets. We address the complexities of large-scale training on a global dataset of 63 seismic volumes using the masked autoencoder (MAE) architecture with the ViT-H model consisting of 660 million parameters. We leverage a cloud-native, digitalised seismic data infrastructure to address the data engineering challenges, avoiding duplication. For a downstream task, a salt segmentation model trained using interpretation labels from the Gulf of Mexico and Brazil demonstrated zero-shot generalisation on a West African survey. These findings underscore the potential of pre-trained foundation models to overcome the limitations of iterative approaches and extend seismic interpretation across diverse basins, marking a significant advancement in scalable machine learning for subsurface challenges.