The oil and gas industry presents a challenging and exciting environment for data projects due to the size, complexity, and variability in formatting, type, and quality of the data collected. This environment makes delivering and maintaining a data science pipeline from source systems through to the end user an enormous challenge in many companies ( ). Many projects fail before any analytics can even applied to the data due to difficulties handling legacy systems, data silos, complex dependencies between data sources, and more. In other cases, data science projects can only advance in one area or division of a company because of differences in data handling despite having broad applicability through the company’s assets. This presentation will discuss California Resources Corporation’s new company-wide data analytics effort as a case study of how we have used technologies like data virtualization ( ) and programming architectural principles such as abstraction to tackle difficult data integration and data quality problems to construct a data science pipeline capable of delivering results company-wide. Many of these problems have frustrated multimillion dollar attempts to address them in the recent past.


Article metrics loading...

Loading full text...

Full text loading...


  1. Martin, R.
    [2017] The database is a detail. Clean Architecture for Code: A Craftsman’s Guide to Software Structure and Design. 277–281. ISBN 978-0134494166.
    [Google Scholar]
  2. Van Der Lans, R.
    [2018] Architecting the Multi-Purpose Data Lake with Data Virtualization. Denodo whitepapers.
    [Google Scholar]
  3. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, , Chaudhary, V., and Young, M.
    [2014] Machine Learning: The High Interest Credit Card of Technical Debt. Software Engineering for Machine Learning (NIPS 2014 Workshop).
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error