Full text loading...
-
Machine Learning to Support Technical Document Indexing, a Case Study on Seismic Acquisition Reports
- Publisher: European Association of Geoscientists & Engineers
- Source: Conference Proceedings, 80th EAGE Conference and Exhibition 2018, Jun 2018, Volume 2018, p.1 - 5
Abstract
From the drill floor to the top floor, all exploration decisions are based on data. Today, industry standards formats proposed by the SEG or Energistics are structured and facilitate the transfer and archiving of the measurements, together with associated metadata. The xml formats proposed by Energistics such as WITSML™ also make it possible to stream the information in support of real-time decisions.
Nevertheless, to have a full understanding of the context of a survey, geoscientists still have to go to the acquisition reports. These reports are available in PDF or TIFF unstructured formats which are very difficult to index automatically at a large scale.
Various attempts to apply some deterministic data mining approaches have been disappointing due to the high variability of reports formats and layout styles.
In order to illustrate the potential of machine learning systems to index automatically subsurface related documents, we have built a learning models to detect 20 metadata items among seismic acquisition, QAQC, HSE and navigation reports. This has confirmed the capacity of ML to index on demand large volumes of documents. This also opens the possibility to extract data from unstructured documents prior to applying classical modelling or data analytic.