Full text loading...
The oil and gas industry continues to digitise subsurface data, yet much of the high value information remains trapped in unstructured formats such as scanned final well reports, daily drilling reports, and biostratigraphy analyses. While the OSDU® Data Platform standardises structured datasets, unlocking value from unstructured records requires semantic enrichment and rigorous security.
This article presents a scalable, entitlement first Retrieval Augmented Generation (RAG) architecture that transforms unstructured, OSDU referenced content into actionable intelligence. The approach combines document reconstruction, header aware chunking, and hybrid retrieval - Best Matching 25 (BM25) + vector search fused via Reciprocal Rank Fusion (RFF) – with preretrieval filtering that maps Entra ID identities to OSDU Access Control Lists (ACLs). On a curated 250 question pilot set representative of subsurface workflows, semantic reconstruction and hybrid retrieval improved recall and precision by up to 20% relative to a naïve baseline, with reported reductions in time-to-answer. The article clarifies how RAG grounds generation and how ReAct agents can orchestrate multistep decision support on top of a trusted foundation.
Overall, the study outlines a practical path from unstructured ‘text soup’ to compliant, auditable answers suitable for enterprise deployment at scale.