1887

Abstract

Summary

Using data extracted from geotechnical documentation of past projects can be essential for informed decisionmaking. However, much of this data exists in unstructured formats, such as scanned documents containing images of texts, photos, handwritten notes. Such an unstructured format poses challenges for data extraction and analysis. Some methods, primarily based on machine learning (ML) applications, can be successfully applied for feature extraction from images. However, all these methods require the user to pre-define features that shall be found on the image of a document: that could be text extraction with OCR (optical character recognition) or image segmentation for detecting, e.g., signs of weathering, joints detection, lithology recognition. A new AI/ML model must be built and trained for every feature the user would like to extract.

This study proposes using advances in generative artificial intelligence (GenAI) to automate extracting, structuring, and integrating information from unstructured geotechnical documents without pre-defining the exact features to be extracted. By employing large language models (LLMs) and GPT technology, the proposed approach aims to transform images of archived documents into structured datasets. By integrating expertise from past projects via data analysis, the study aims to improve the accuracy and robustness of decision-support systems in geotechnical engineering.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202539065
2025-03-24
2025-11-16
Loading full text...

Full text loading...

References

  1. Liu, X. et al., (2024). “Hybrid Neural Networks for Lithology Identification and Weathering Classification.” Journal of Geotechnical Engineering Research, 45(3), 231–245.
    [Google Scholar]
  2. Kim, Y., & Yun, H. (2024). “Automated Rock Mass Rating Prediction Using CNN Models.” AI Applications in Geotechnical Engineering, 12(2), 89–103.
    [Google Scholar]
  3. Gemini Vision LLM. (2023). “Building an Image Data Extractor.” Analytics Vidhya. Retrieved from https://www.analyticsvidhya.com.
    [Google Scholar]
  4. OntoGPT. (2024). “Ontological Extraction Using LLMs.” GitHub Repository. Retrieved from https://github.com/monarch-initiative/ontogpt.
    [Google Scholar]
  5. Pan, X.et al., (2024). “Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models.” Proceedings of WACV 2024. Retrieved from https://openaccess.thecvf.com.
    [Google Scholar]
  6. DALL-E and GPT-4 Vision (2023). “Applications in Annotating Complex Medical Imagery.” Retrieved from https://openai.com/research.
    [Google Scholar]
/content/papers/10.3997/2214-4609.202539065
Loading
/content/papers/10.3997/2214-4609.202539065
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error