1887

Abstract

Summary

Data extraction is the process of analyzing and transforming unstructured information into structured data. Structured data can then generate meaningful insights for reporting and analytics in companies. Automation of such tasks can improve the efficiency of operational workflows and help professionals save time for more advanced and higher-value activities in their daily work. Recently, Machine Learning, Computer Vision and Natural Language Processing have been intensively developed and largely employed to automate information extraction. However, still few practical case-studies on operational geoscience data are documented. In this paper, we develop an integrated workflow to automate the extraction of images of interest and the associated information in geoscience documents. The developed workflow relies on a combination of free Python packages for Natural Language Processing, Computer Vision, Optical Character Recognition and Machine Learning. This workflow was applied on a case study using data from the LUGOS Oil Field. The objective was to automatically extract and document the evolutive interpretation of principal structural maps during several decades of field development. The proposed workflow provided very positive results, as the whole automated process had a success rate above 90% on the case-study, while lasting only 5 hours instead of several weeks of manual work.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202332025
2023-03-20
2024-04-27
Loading full text...

Full text loading...

References

  1. Bormann, P. (2022). An Image says a Thousand Words; Addressing the Unstructured Documents Challenge by using Machine Learning to Extract, Classify and Geolocate Images from Documents at Industrial Scale, DIGEX conference 2022, Stavanger (Norway) — reported in
    [Google Scholar]
  2. KombrinkH. (2022) Could there be something useful in report “Final_final_v3”?, Geo ExPro, April 2022
    [Google Scholar]
  3. Deepika, J., Sowmya, V., & Soman, K.P. (2014). Image Classification Using Convolutional Neural Networks.International Journal of Scientific and Engineering Research. 5. 1661–1668.
    [Google Scholar]
  4. Muhammed, J. A., Shahnaj, P., & Subrina, A. (2015). Significant HOG-Histogram of Oriented Gradient Feature Selection for Human Detection.International Journal of Computer Applications. 132. 20–24.
    [Google Scholar]
  5. Nundloll, V., Smail, R., Stevens, C., & Blair, G. (2022). Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science.Heliyon. 8(10).
    [Google Scholar]
  6. Wiechork, K. and Charão, A. (2021). Automated Data Extraction from PDF Documents: Application to Large Sets of Educational Tests.International Conference on Enterprise Information Systems ICEIS 2021. 1. 359–366
    [Google Scholar]
http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.202332025
Loading
/content/papers/10.3997/2214-4609.202332025
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error