1887
Volume 44, Issue 2
  • ISSN: 0263-5046
  • E-ISSN: 1365-2397

Abstract

Abstract

The oil and gas industry continues to digitise subsurface data, yet much of the high value information remains trapped in unstructured formats such as scanned final well reports, daily drilling reports, and biostratigraphy analyses. While the OSDU® Data Platform standardises structured datasets, unlocking value from unstructured records requires semantic enrichment and rigorous security.

This article presents a scalable, entitlement first Retrieval Augmented Generation (RAG) architecture that transforms unstructured, OSDU referenced content into actionable intelligence. The approach combines document reconstruction, header aware chunking, and hybrid retrieval - Best Matching 25 (BM25) + vector search fused via Reciprocal Rank Fusion (RFF) – with preretrieval filtering that maps Entra ID identities to OSDU Access Control Lists (ACLs). On a curated 250 question pilot set representative of subsurface workflows, semantic reconstruction and hybrid retrieval improved recall and precision by up to 20% relative to a naïve baseline, with reported reductions in time-to-answer. The article clarifies how RAG grounds generation and how ReAct agents can orchestrate multistep decision support on top of a trusted foundation.

Overall, the study outlines a practical path from unstructured ‘text soup’ to compliant, auditable answers suitable for enterprise deployment at scale.

Loading

Article metrics loading...

/content/journals/10.3997/1365-2397.fb2026014
2026-02-01
2026-02-16
Loading full text...

Full text loading...

References

  1. Cormack, G.V., Clarke, C.L.A. and Buettcher, S. [2009] Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR ′09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 403–410.
    [Google Scholar]
  2. IDC [2023] Worldwide Global DataSphere Structured and Unstructured Data Forecast, 2023–2027. International Data Corporation.
    [Google Scholar]
  3. Kumar, P, Tveritnev, A., Jan, S.A. and Iqbal, R. [2023] Challenges to Opportunity: Getting Value Out of Unstructured Data Management. SPE Annual Technical Conference and Exhibition, SPE-214251-MS.
    [Google Scholar]
  4. Lewis, P, Perez, E., Piktus, A., Petroni, F., Karpukhin, V, Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S. and Kiela, D. [2020] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474.
    [Google Scholar]
  5. MIT Sloan [2021] Tapping the Power of Unstructured Data. MIT Sloan Management Review.
    [Google Scholar]
  6. OSDU Forum [2023] OSDU Data Platform Technical Standard. The Open Group.
    [Google Scholar]
  7. Robertson, S. and Zaragoza, H. [2009] The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.
    [Google Scholar]
  8. Walker, A. [2019] Oil and Gas Has a Problem With Unstructured Data. Journal of Petroleum Technology, 71(11), 32–34.
    [Google Scholar]
  9. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. and Cao, Y. [2023] ReAct: Synergising Reasoning and Acting in Language Models. International Conference on Learning Representations (ICLR).
    [Google Scholar]
/content/journals/10.3997/1365-2397.fb2026014
Loading
/content/journals/10.3997/1365-2397.fb2026014
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error