Full text loading...
Large Language Models (LLM) have demonstrated remarkable capabilities in natural language contextual comprehension and generation. However, their performance in specialized domains like geoscience can be limited due to unfamiliarity with domain-specific terminology and concepts. Recently, Retrieval-Augmented Generation (RAG) became a popular method to enhance the quality of answers by integrating external knowledge bases. Then, in 2024, GraphRAG methods extended RAG by using graph-based structures to capture complex entity relationships. In this work, we aim to leverage these technologies to build a robust chatbot for the retrieval and use of geoscientific information. Practically, we integrated advanced RAG techniques with graph-based retrieval and an agentic architecture based on the ReAct framework to improve GPT-4o performance in processing geoscientific texts. Then, we conducted a benchmark using public geo-characterization reports from the Pilot Strategy project focusing on CO2 storage, and a comprehensive set of technical questions. We observed that the GPT-4o with GraphRAG configuration significantly outperformed the other models, giving answers that are more accurate, detailed, and contextually relevant, particularly in complex geoscientific scenarios. Thus, this work highlights the potential of integrating agentic frameworks and graph-enhanced retrieval methods to develop advanced tools for efficient information extraction in geoscience and other complex fields.