1887

Abstract

Summary

There is a tremendous amount of information available and stored in digital geoscientific documents and published reports in the energy industry. These documents contain a distillation of reservoir information from diverse discipline of geologists, geophysicists, petrophysicists and drillers, that are stored in unstructured format, which find further use in succeeding reservoir modeling stages. In particular, national data management repositories and oil companies hosts these huge amounts of historical well reports containing information such as lithology, hydrocarbon shows, and other reservoir data. Due to the large volume, vintage variety, and non-standardized formats, extraction of valuable information that are used as inputs for interpretation, is an arduous, very time-consuming task. Our solution is to develop ElasticDocs a machine learning-enabled platform in a hybrid cloud container that automatically reads and understand hundreds or thousand of technical documents with little human supervision through a smart combination of machine learning algorithms including optical character recognition (OCR), elatic search, natural language processing (NLP), clustering and deep convolutional neural network. The platform uses a hybrid, 2-tier data service architecture leveraging on the strength of both the strength of local servers and cloud to enhance data security, integrity, and accessibility.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.201803242
2018-12-03
2024-04-27
Loading full text...

Full text loading...

References

  1. Buades, A., Coll, B., & Morel.J.M.
    , [2011]. Non-Local Means Denoising. Image Processing On Line, 1, 208–212.
    [Google Scholar]
  2. Duda, R. & Hart, P.
    [1972]. Use of the Hough Transformation to detect Lines and Curves in Pictures. Commun. ACM, 15(1):11–15.
    [Google Scholar]
  3. Michelbacher, L.
    [2013]. Multi-word Tokenization for Natural Language Processing. Ph.D. thesis, University of Stuttgart.
    [Google Scholar]
  4. Ratinov, L., & Roth, D.
    [2009]. Design Challenges and Misconceptions in Named Entity Recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 147–155.
    [Google Scholar]
  5. Simonyan, K. & Zisserman, A.
    , [2014] Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556
    [Google Scholar]
  6. Saha, S., Basu, S., Nasipuri, M. & Kr. Basu, D.
    [2010] A Hough Transform based Technique for Text Segmentation. Journal of Computing, 2, 134–141.
    [Google Scholar]
  7. Smith, R.
    [2007]. An Overview of the Tesseract OCR Engine, Proc. International Conference on Document Analysis and Recognition.
    [Google Scholar]
  8. SuenC.Y., BerthodM., & MoriS.
    , [1980]. Automatic Recognition of Handprinted Characters - The State of the Art. IEEE Proceedings, Vol. 68, No. 4, pp. 469–487.
    [Google Scholar]
  9. van der Maaten, L.J.P. & HintonG.E.
    [2008]. Visualizing High-Dimensional Data using t-SNE. Journal of Machine Learning Research, 9, 2576–2605.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.201803242
Loading
/content/papers/10.3997/2214-4609.201803242
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error