Machine Learning has increased its role in several industries, becoming an essential tool, and competitive advantage. However, questions around training data lineage, or provenance, e.g., “where did the data used to train this model came from?”; the introduction of several new data protection legislation; and, the need for data governance requirements, has hindered the adoption of machine learning models in the real world.

In this paper, we discuss how data lineage can be leveraged to benefit the Machine Learning (ML) lifecycle to build ML models to discover sweet-spots for shale oil and gas production, a major application for the Oil and Gas (O&G) Industry.


Article metrics loading...

Loading full text...

Full text loading...


  1. Bishop, C.M.
    [2006] Pattern recognition and machine learning. springer.
    [Google Scholar]
  2. Guevara, J., Zadrozny, B., Buoro, A., Lu, L., Tolle, J., Limbeck, J.W., Hohl, D.
    et al. [2019] A Machine-Learning Methodology Using Domain-Knowledge Constraints for Well-Data Integration and Well-Production Prediction. SPE Reservoir Evaluation & Engineering.
    [Google Scholar]
  3. Herschel, M., Diestelkämper, R. and Ben Lahmar, H.
    [2017] A survey on provenance: What for? What form? What from?The VLDB Journal – The International Journal on Very Large Data Bases, 26(6), 881–906.
    [Google Scholar]
  4. Moreau, L., Missier, P., Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Klyne, G.
    et al. [2013] Prov-dm: The prov data model. Retrieved July, 30(2013), W3C.
    [Google Scholar]
  5. Moreno, M.F., BrandÃčo, R. and Cerqueira, R.
    [2016] Extending Hypermedia Conceptual Models to Support Hyperknowledge Specifications. In: 2016 IEEE International Symposium on Multimedia (ISM). 133–138.
    [Google Scholar]
  6. Newman, S.
    [2015] Building microservices: designing fine-grained systems. “O’Reilly Media, Inc.”.
    [Google Scholar]
  7. Souza, R., Azevedo, L., Lourenço, V., Soares, E., Thiago, R., Brandão, R., Civitarese, D., Brazil, E., Moreno, M., Valduriez, P.
    et al. [2019a] Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering. In: 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). IEEE, 1–10.
    [Google Scholar]
  8. Souza, R., Azevedo, L., Thiago, R., Soares, E., Nery, M., Netto, M., Brazil, E.V., Cerqueira, R., Valduriez, P. and Mattoso, M.
    [2019b] Efficient runtime capture of multiworkflow data using provenance.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error