1887

Abstract

Summary

Large Language Models (LLM) have demonstrated remarkable capabilities in natural language contextual comprehension and generation. However, their performance in specialized domains like geoscience can be limited due to unfamiliarity with domain-specific terminology and concepts. Recently, Retrieval-Augmented Generation (RAG) became a popular method to enhance the quality of answers by integrating external knowledge bases. Then, in 2024, GraphRAG methods extended RAG by using graph-based structures to capture complex entity relationships. In this work, we aim to leverage these technologies to build a robust chatbot for the retrieval and use of geoscientific information. Practically, we integrated advanced RAG techniques with graph-based retrieval and an agentic architecture based on the ReAct framework to improve GPT-4o performance in processing geoscientific texts. Then, we conducted a benchmark using public geo-characterization reports from the Pilot Strategy project focusing on CO storage, and a comprehensive set of technical questions. We observed that the GPT-4o with GraphRAG configuration significantly outperformed the other models, giving answers that are more accurate, detailed, and contextually relevant, particularly in complex geoscientific scenarios. Thus, this work highlights the potential of integrating agentic frameworks and graph-enhanced retrieval methods to develop advanced tools for efficient information extraction in geoscience and other complex fields.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202539032
2025-03-24
2026-02-15
Loading full text...

Full text loading...

References

  1. Brown, T., Mann, B., Ryder, N.et al., (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
    [Google Scholar]
  2. Edge, D., Ding, J., Yang, F.et al., (2024). From Local to Global: A Graph RAG Approach to Query- Focused Summarization. arXiv preprint arXiv:2404.16130.
    [Google Scholar]
  3. Lewis, P., Perez, E., Piktus, A. et al., (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.
    [Google Scholar]
  4. Vaswani, A., Shazeer, N., Parmar, N. et al., (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30, 5998–6008.
    [Google Scholar]
  5. Wang, J.: GeoGPT, the large earth science language model system, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18265, https://doi.org/10.5194/egusphere-egu24-18265, 2024.
    [Google Scholar]
  6. Yao, S., Cui, Y., Bose, M. et al., (2023). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629.
    [Google Scholar]
  7. PilotStrategy Project: Details available on their website, as long as the reports https://pilotstrategy.eu/
/content/papers/10.3997/2214-4609.202539032
Loading
/content/papers/10.3997/2214-4609.202539032
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error