1887

Abstract

Summary

This study explores the integration of data Lakehouse platforms, Apache Spark, and BI(Business Intelligence) tools to enhance the efficiency of historical drilling data analysis for well planning. Advanced downhole equipment generates terabytes of data at high frequencies, necessitating robust storage and processing solutions. By leveraging a corporate data lake and cloud-based PySpark, the study organizes and interconnects hundreds of well datasets, integrating them into a dedicated database for seamless visualization via BI tools.

Key analyses included Dogleg Severity (DLS) computations, machine learning (ML)-based bit performance evaluation using Logging While Drilling (LWD) data. Challenges such as data preprocessing, outlier removal, and code validation were addressed through iterative development and condition enhancements. The study highlights the automation of offset wells analysis, significantly reducing time and effort compared to manual approaches.

Novel contributions enabling Drilling Engineers to adopt roles of Data Engineers and Scientists. The findings underscore the potential of ML in automating analytical workflows and extracting actionable insights from extensive datasets, driving efficiency and innovation in drilling operations.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202539005
2025-03-24
2026-02-18
Loading full text...

Full text loading...

References

  1. Armbrust, M., Ghodsi, A., Xin, R., & Zaharia, M.2024. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. Databricks, UC Berkeley, Stanford University.
    [Google Scholar]
  2. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D.2015. Hidden Technical Debt in Machine Learning Systems. NIPS (2015), pp. 2503–2511.
    [Google Scholar]
  3. Zaharia, M., Chen, A., Davidson, A., Ghodsi, A., Hong, S. A., Konwinski, A., Murching, S., Nykodym, T., Ogilvie, P., Parkhe, M., Xie, F., & Zumar, C.2024. Accelerating the Machine Learning Lifecycle with MLflow. Databricks Inc
    [Google Scholar]
  4. Zaharia, M., Xin, R S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I.2024. Apache Spark: A Unified Engine for Big Data Processing.
    [Google Scholar]
/content/papers/10.3997/2214-4609.202539005
Loading
/content/papers/10.3997/2214-4609.202539005
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error