Full text loading...
Using data extracted from geotechnical documentation of past projects can be essential for informed decisionmaking. However, much of this data exists in unstructured formats, such as scanned documents containing images of texts, photos, handwritten notes. Such an unstructured format poses challenges for data extraction and analysis. Some methods, primarily based on machine learning (ML) applications, can be successfully applied for feature extraction from images. However, all these methods require the user to pre-define features that shall be found on the image of a document: that could be text extraction with OCR (optical character recognition) or image segmentation for detecting, e.g., signs of weathering, joints detection, lithology recognition. A new AI/ML model must be built and trained for every feature the user would like to extract.
This study proposes using advances in generative artificial intelligence (GenAI) to automate extracting, structuring, and integrating information from unstructured geotechnical documents without pre-defining the exact features to be extracted. By employing large language models (LLMs) and GPT technology, the proposed approach aims to transform images of archived documents into structured datasets. By integrating expertise from past projects via data analysis, the study aims to improve the accuracy and robustness of decision-support systems in geotechnical engineering.