From the drill floor to the top floor, all exploration decisions are based on data. Today, industry standards formats proposed by the SEG or Energistics are structured and facilitate the transfer and archiving of the measurements, together with associated metadata. The xml formats proposed by Energistics such as WITSML™ also make it possible to stream the information in support of real-time decisions.

Nevertheless, to have a full understanding of the context of a survey, geoscientists still have to go to the acquisition reports. These reports are available in PDF or TIFF unstructured formats which are very difficult to index automatically at a large scale.

Various attempts to apply some deterministic data mining approaches have been disappointing due to the high variability of reports formats and layout styles.

In order to illustrate the potential of machine learning systems to index automatically subsurface related documents, we have built a learning models to detect 20 metadata items among seismic acquisition, QAQC, HSE and navigation reports. This has confirmed the capacity of ML to index on demand large volumes of documents. This also opens the possibility to extract data from unstructured documents prior to applying classical modelling or data analytic.


Article metrics loading...

Loading full text...

Full text loading...


  1. Blinston, K., H.Blondelle
    , 2017, Machine learning systems open up access to large volumes of valuable information lying dormant in unstructured documents:The Leading Edge. March 2017, p257–261
    [Google Scholar]
  2. Juneja, A., J.Micaelli and J.Johnston
    , 2017, Method and system for extracting, verifying and cataloging technical information from unstructured documents: US patent 20170169103 A1, www.google.com/patents/US20170169103
    [Google Scholar]
  3. Su, F., et al.
    , 2015, Attribute Extracting from Wikipedia Pages in Domain AutomaticallyinV. E.Balas, L. C.Jain, XZhao, eds., Information Technology and Intelligent Transportation Systems:Springer International Publishing, 433–440.
    [Google Scholar]
  4. Vapnik, V.N.
    , 1999, An overview of statistical learning theory:IEEE Transactions On Neural Networks, 10, no. 5, 988–999, http://web.mit.edu/6.962/www/www_spring_2001/emin/slt.pdf.
    [Google Scholar]
  5. Zhong, B., J.Liu, Y.Du, Y.Liaozheng, and J.Pu
    , 2016, Extracting attributes of named entity from unstructured text with deep belief network:International Journal of Database Theory and Application9.5, no. 5, 187–196, http://dx.doi.org/10.14257/ijdta.2016.9.5.19.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error