1887
Volume 43, Issue 2
  • ISSN: 0263-5046
  • E-ISSN: 1365-2397
PDF

Abstract

Abstract

Geoscientists and engineers often need quick, reliable answers from confidential or internal documents. Generic cloud-based chatbots struggle to provide accurate, industry-specific information. Moreover, they are not allowed to access internal knowledge bases. To solve this, we developed a local, self-hosted chatbot that uses a local Large Language Model (LLM) combined with an AI-based search system fine-tuned to offshore drilling data. Our setup ensures reliable domain-relevant responses without sending information to external servers and limiting false information generation called ‘hallucination’. By keeping all data in-house and enhancing retrieval accuracy, this methodology offers a practical way to build secure, specialised chatbots for other subsurface applications. We provide open-source code and a setup guide to facilitate reproducibility and adoption.

Loading

Article metrics loading...

/content/journals/10.3997/1365-2397.fb2025012
2025-02-01
2025-02-19
Loading full text...

Full text loading...

/deliver/fulltext/fb/43/2/fb2025012.html?itemId=/content/journals/10.3997/1365-2397.fb2025012&mimeType=html&fmt=ahah

References

  1. Bhattaru, A., Yanamala, N. and Sengupta, P.P. [2024]. Revolutionizing Cardiology with Words: Unveiling the Impact of Large Language Models in Medical Science Writing. Canadian Journal of Cardiology.
    [Google Scholar]
  2. Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A. and Ganapathy, R. [2024]. The llama 3 herd of models. arXiv preprint arXiv:2407.21783.
    [Google Scholar]
  3. Gao, L., Ma, X., Lin, J. and Callan, J. [2022]. Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval (No. arXiv:2203.05765). arXiv.
    [Google Scholar]
  4. Huggingface [2024]. Transformers. Retrieved December 20, 2024, from https://huggingface.co/docs/transformers.
    [Google Scholar]
  5. Jones, K.S., Walker, S. and Robertson, S.E. [2000]. A probabilistic model of information retrieval: development and comparative experiments: Part 2. Information processing & management, 36(6), 809–840.
    [Google Scholar]
  6. Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. and Yih, W. [2020]. Dense Passage Retrieval for Open-Domain Question Answering. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769–6781. Association for Computational Linguistics.
    [Google Scholar]
  7. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S. and Kiela, D. [2021]. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (No. arXiv:2005.11401). arXiv.
    [Google Scholar]
  8. Li, X., Chan, S., Zhu, X., Pei, Y., Ma, Z., Liu, X. and Shah, S. [2023]. Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks (No. arXiv:2305.05862). arXiv.
    [Google Scholar]
  9. Liu, M., Ene, T.-D., Kirby, R., Cheng, C., Pinckney, N., Liang, R., Alben, J., Anand, H., Banerjee, S., Bayraktaroglu, I., Bhaskaran, B., Catanzaro, B., Chaudhuri, A., Clay, S., Dally, B., Dang, L., Deshpande, P., Dhodhi, S., Halepete, S. and Ren, H. [2024]. ChipNeMo: Domain-Adapted LLMs for Chip Design (No. arXiv:2311.00176). arXiv.
    [Google Scholar]
  10. Ma, X., Sun, K., Pradeep, R. and Lin, J. [2021]. A Replication Study of Dense Passage Retriever (No. arXiv:2104.05740). arXiv.
    [Google Scholar]
  11. Machlab, D. and Battle, R. [2024]. LLM In-Context Recall is Prompt Dependent (No. arXiv:2404.08865). arXiv.
    [Google Scholar]
  12. Mosser, L., Aursand, P., Brakstad, K.S., Lehre, C. and Myhre-Bakkevig, J. [2024]. Exploration Robot Chat: Uncovering Decades of Exploration Knowledge and Data with Conversational Large Language Models. D011S002R006. SPE Norway Subsurface Conference.
    [Google Scholar]
  13. Ollama [2024]. Ollama: AI tools and resources. Retrieved December 20, 2024, from https://ollama.com/.
    [Google Scholar]
  14. OpenAI [2024]. GPT-4 Turbo and GPT-4. Retrieved December 20, 2024, from https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4, accessed: 2024.
    [Google Scholar]
  15. Pacis, F.J. [2024a]. Improved retrieval for drilling applications [GitHub repository]. Retrieved December 20, 2024, from https://github.com/fjpax/improved_retrieval_drilling.
    [Google Scholar]
  16. Pacis, F.J., Alyaev, S. and Wiktorski, T. [2024b]. Domain-adapted Embeddings Model Using Contrastive Learning for Drilling Text Data. In International Conference on Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications. Springer Nature, In Print.
    [Google Scholar]
  17. Pacis, F. J., Alyaev, S., Pelfrene, G. and Wiktorski, T. [2024c]. Enhancing Information Retrieval in the Drilling Domain: Zero-Shot Learning with Large Language Models for Question-Answering. In SPE/IADC Drilling Conference and Exhibition (p. D011S002R004). SPE.
    [Google Scholar]
  18. Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C. and Tang, S. [2024]. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921.
    [Google Scholar]
  19. Robertson, S. and Zaragoza, H. [2009]. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389.
    [Google Scholar]
  20. Sakib, S.N. [2024]. Bane and Boon of Hallucinations in the Context of Generative AI. In Cases on AI Ethics in Business, 276–299. IGI Global.
    [Google Scholar]
  21. Singh, A., Jia, T. and Nalagatla, V. [2023]. Generative AI Enabled Conversational Chatbot for Drilling and Production Analytics. ADIPEC.
    [Google Scholar]
  22. Wang, J. and Dong, Y. [2020]. Measurement of text similarity: a survey. Information, 11(9), 421.
    [Google Scholar]
  23. Wang, L., Yang, N., Huang, X., Jiao, B., Yang, L., Jiang, D., Majumder, R. and Wei, F. [2024]. Text Embeddings by Weakly-Supervised Contrastive Pre-training (No. arXiv:2212.03533). arXiv.
    [Google Scholar]
  24. Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D. and Mann, G. [2023]. BloombergGPT: A Large Language Model for Finance (No. arXiv:2303.17564). arXiv.
    [Google Scholar]
  25. Zhang, L., Pacis, F.J. and Alyaev, S. [2024]. Cloud-Free Question Answering Chatbot for Drilling Applications. GitHub. Retrieved December, 20, from https://github.com/NORCE-DrillingAndWells/drilling_cloudfree_chatbot.
    [Google Scholar]
  26. Zhong, Z., Liu, H., Cui, X., Zhang, X. and Qin, Z. [2024]. Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation (No. arXiv:2406.00456). arXiv.
    [Google Scholar]
/content/journals/10.3997/1365-2397.fb2025012
Loading
/content/journals/10.3997/1365-2397.fb2025012
Loading

Data & Media loading...

  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error