1887

Abstract

Summary

Digitizing legacy geological reports is important for enabling modern analytics, yet most existing Retrieval-Augmented Generation (RAG) pipelines struggle with accuracy, often producing hallucinations or inconsistent answers. In this work, we explore how LangGraph can be used to make these workflows more reliable by adding correction loops and structured state handling. We tested three large language models—Meta LLaMA-3-90B, Anthropic Claude Sonnet, and DeepSeek R1—on geological well data, and evaluated them using three perspectives: expert scoring (LLM-as-a-Judge), lexical alignment (TF-IDF with embeddings), and semantic similarity (Word2Vec with embeddings). Our results show that DeepSeek provides the strongest semantic understanding, Claude Sonnet aligns best with expert phrasing, while LLaMA-3 delivers competitive but more variable outcomes. Overall, the LangGraph approach reduced errors, improved consistency, and provided clearer evaluation of how different models perform in scientific Q&A. This study shows that LangGraph is not just a research framework but a practical method for making generative AI more dependable in specialized fields like geoscience, and the methodology can be extended to other industries facing similar digitization challenges.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202639012
2026-03-09
2026-02-13
Loading full text...

Full text loading...

References

  1. Fadeeva, E., Rubashevskii, A., Vashurin, R., Dhuliawala, S., Shelmanov, A., Baldwin, T., & Panov, M. (2025). Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation. arXiv preprint arXiv:2505.21072.
    [Google Scholar]
  2. Zhang, W., & Zhang, J. (2025). Hallucination mitigation for retrieval-augmented large language models: a review. Mathematics, 13(5), 856.
    [Google Scholar]
  3. Cao, H. (2024). Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark. arXiv preprint arXiv:2406.01607.
    [Google Scholar]
  4. Raju, R., Jain, S., Li, B., Li, J., & Thakker, U. (2024). Constructing domain-specific evaluation sets for llm-as-a-judge. arXiv preprint arXiv:2408.08808.
    [Google Scholar]
  5. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    [Google Scholar]
/content/papers/10.3997/2214-4609.202639012
Loading
/content/papers/10.3997/2214-4609.202639012
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error