Full text loading...
-
Automated Extraction of Images of Interest in Document Collections: End-to-End Workflow and Operational Case-Study
- Publisher: European Association of Geoscientists & Engineers
- Source: Conference Proceedings, Third EAGE Digitalization Conference and Exhibition, Mar 2023, Volume 2023, p.1 - 5
Abstract
Data extraction is the process of analyzing and transforming unstructured information into structured data. Structured data can then generate meaningful insights for reporting and analytics in companies. Automation of such tasks can improve the efficiency of operational workflows and help professionals save time for more advanced and higher-value activities in their daily work. Recently, Machine Learning, Computer Vision and Natural Language Processing have been intensively developed and largely employed to automate information extraction. However, still few practical case-studies on operational geoscience data are documented. In this paper, we develop an integrated workflow to automate the extraction of images of interest and the associated information in geoscience documents. The developed workflow relies on a combination of free Python packages for Natural Language Processing, Computer Vision, Optical Character Recognition and Machine Learning. This workflow was applied on a case study using data from the LUGOS Oil Field. The objective was to automatically extract and document the evolutive interpretation of principal structural maps during several decades of field development. The proposed workflow provided very positive results, as the whole automated process had a success rate above 90% on the case-study, while lasting only 5 hours instead of several weeks of manual work.