Full text loading...
The study introduces a unified multi-modal transformer–GAN framework for high-resolution reservoir property modeling by integrating seismic, well-log, and core data. Unlike conventional geostatistical methods that treat individual data sources independently, the approach fuses multiple modalities through cross-modal attention to capture geological context and multi-scale spatial dependencies. The workflow comprises seismic feature extraction using a 3D convolutional autoencoder, well-log processing via temporal convolution and self-attention, and uncertainty-aware core embedding through Gaussian-process priors. These are fused into a context vector guiding conditional generation of porosity and permeability cubes. Physical consistency is ensured through facies-preserving, realism, and physics-based regularization losses.
Tested on a North Sea carbonate field, the method achieved porosity MAE of 0.087 and permeability correlation of 0.82, outperforming seismic-only, log-only, and co-kriging baselines. The model reproduced thin-bedded structures (1–3 m thick) with 89% vertical accuracy and 94% porosity–facies alignment, while cutting inference time to less than one second per model compared to hours for kriging. Attention-weight analysis showed modality dominance varying with geological conditions—seismic in sand-rich zones, logs in complex structures, and cores in thin beds. The study establishes transformer-driven multi-modal fusion as a scalable, interpretable, and geologically consistent paradigm for reservoir characterization.