Full text loading...
This study presents a **Graph-based Retrieval-Augmented Generation (Graph RAG)** framework for intelligent retrieval of information from legacy **geoscience and drilling reports**, which are often unstructured or scanned. Using the **1995 Statoil geological summary report** as a case study, text was extracted via **GPT-4o OCR**, cleaned, and divided into page-wise chunks. Each chunk was represented as a **node** in a knowledge graph built using **LightRAG**, with **edges** capturing semantic and conceptual relationships between related geological entities such as formations, lithology, and drilling operations. Unlike conventional RAG systems that rely solely on vector similarity, Graph RAG performs **graph-based retrieval and traversal**, enabling the system to capture contextual and relational dependencies across document sections. Experimental results show that Graph RAG achieves **higher retrieval accuracy and completeness**, successfully extracting formation-level information that naïve RAG models missed. The approach demonstrates the value of **relationship-aware retrieval** in geoscience applications, offering a scalable framework for interpreting and querying legacy petroleum reports. Future work will focus on integrating **geological ontologies** and expanding the pipeline to multi-document collections for enhanced exploration data management.