1887
Volume 73, Issue 8
  • E-ISSN: 1365-2478

Abstract

ABSTRACT

Deploying large datasets for training machine learning models often reveals more information about the target variable and helps to avoid overfitting. However, these advantages are associated with certain challenges, such as data noise and redundancy. In the present study on well log data consisting of a relatively large dataset (40 wells from the Cambay Basin), we deploy different classes of feature selection methods (filter‐based methods, wrapper‐based methods and embedded methods) to obtain the optimal feature set aimed at accurate prediction of sonic logs. Additionally, we utilize methods such as the boxplot and histogram analysis to remove outliers present in the dataset. Subsequently, we use XGBoost as our machine learning model, with fivefold cross‐validation and a 70:30 split. We then proceed to predict the sonic log data in a blind well. We establish that the maximum relevance minimum redundancy method shows the best results with an ‐squared value of 63% when we select three out of six features – depth, neutron porosity and bulk density. Significance of the results was demonstrated using statistical tests of significance, namely one‐way analysis of variance and Tukey's honestly significant difference test. The selection of these features is further validated by established geophysical principles in the form of empirical relationships.

Loading

Article metrics loading...

/content/journals/10.1111/1365-2478.70095
2025-10-22
2025-11-09
Loading full text...

Full text loading...

References

  1. Augusto, F. d. O. A., and J. L.Martins. 2009. “A Well‐Log Regression Analysis for P‐Wave Velocity Prediction in the Namorado Oil Field, Campos Basin.” Revista Brasileira de Geofisica27: 595–608.
    [Google Scholar]
  2. Chandrashekar, G., and F.Sahin. 2014. “A Survey on Feature Selection Methods.” Computers & Electrical Engineering40, no. 1: 16–28.
    [Google Scholar]
  3. Cranganu, C., and M.Breaban. 2013. “Using Support Vector Regression To Estimate Sonic Log Distributions: A Case Study From the Anadarko Basin, Oklahoma.” Journal of Petroleum Science and Engineering103: 1–13.
    [Google Scholar]
  4. Ellis, D. V., and J. M.Singer. 2007. Well Logging for Earth Scientists. vol. 692. Dordrecht: Springer. https://doi.org/10.1007/978‐1‐4020‐4602‐5.
    [Google Scholar]
  5. Galli, S.2021. “Feature‐Engine: A Python Package for Feature Engineering for Machine Learning.” Journal of Open Source Software6, no. 65: 3642.
    [Google Scholar]
  6. Geng, Z., and Y.Wang. 2020. “Physics‐Guided Deep Learning for Predicting Geological Drilling Risk of Wellbore Instability Using Seismic Attributes Data.” Engineering Geology279: 105857.
    [Google Scholar]
  7. Jović, A., K.Brkić, and N.Bogunović. 2015. “A Review of Feature Selection Methods With Applications.” In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 1200–1205. IEEE.
  8. Kearey, P., M.Brooks, and I.Hill. 2002. An Introduction to Geophysical Exploration, vol. 4. John Wiley & Sons.
    [Google Scholar]
  9. Kumar, V., and S.Minz. 2014. “Feature Selection: A Literature Review.” SmartCR4, no. 3: 211–229.
    [Google Scholar]
  10. Li, J., K.Cheng, S.Wang, et al. 2017. “Feature Selection: A Data Perspective.” ACM Computing Surveys (CSUR)50, no. 6: 1–45.
    [Google Scholar]
  11. Li, Z., J.Xia, Z.Liu, G.Lei, K.Lee, and F.Ning. 2023. “Missing Sonic Logs Generation for Gas Hydrate‐Bearing Sediments Via Hybrid Networks Combining Deep Learning With Rock Physics Modeling.” IEEE Transactions on Geoscience and Remote Sensing61: 1–15.
    [Google Scholar]
  12. Li, Z., Z.Zhang, L.Jiang, et al. 2025. “WirMAE: Learning Well‐Logging Interval Representations via Masked Autoencoders for Gas Hydrate Reservoir Characterization.” IEEE Transactions on Geoscience and Remote Sensing63: 5915416.
    [Google Scholar]
  13. Liu, H., and L.Yu. 2005. “Toward Integrating Feature Selection Algorithms for Classification and Clustering.” IEEE Transactions on Knowledge and Data Engineering17, no. 4: 491–502.
    [Google Scholar]
  14. Mohan, M.1995. “Cambay Basin–A Promise of Oil and Gas Potential.” Journal of the Palaeontological Society of India40: 41–47.
    [Google Scholar]
  15. Moore, W. R., Y. Z.Ma, J.Urdea, and T.Bratton. 2011. “Uncertainty Analysis in Well‐Log and Petrophysical Interpretations.” Uncertainty Analysis and Reservoir Modeling, 17–28. https://doi.org/10.1306/13301405m963478.
    [Google Scholar]
  16. Onalo, D., S.Adedigba, F.Khan, L. A.James, and S.Butt. 2018. “Data Driven Model for Sonic Well Log Prediction.” Journal of Petroleum Science and Engineering170: 1022–1037.
    [Google Scholar]
  17. Pedregosa, F., G.Varoquaux, A.Gramfort, et al. 2011. “Scikit‐learn: Machine Learning in Python.” Journal of Machine Learning Research12: 2825–2830.
    [Google Scholar]
  18. Rachburee, N., and W.Punlumjeak. 2015. “A Comparison of Feature Selection Approach Between Greedy, IG‐Ratio, Chi‐Square, and mRMR in Educational Mining.” In Proceedings of the 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), 420–424. IEEE.
  19. Rider, M. H.1986. “The Geological Interpretation of Well Logs.” Blackie.
  20. Rostamian, A., E.Heidaryan, and M.Ostadhassan. 2022. “Evaluation of Different Machine Learning Frameworks To Predict CNL‐FDC‐PEF Logs via Hyperparameters Optimization and Feature Selection.” Journal of Petroleum Science and Engineering208: 109463.
    [Google Scholar]
  21. Rusdah, D. A., and H.Murfi. 2020. “XGBoost in Handling Missing Values for Life Insurance Risk Prediction.” SN Applied Sciences2, no. 8: 1336.
    [Google Scholar]
  22. Seabold, S., and J.Perktold. 2010. “statsmodels: Econometric and Statistical Modeling With Python.” In Proceedings of the 9th Python in Science Conference, 57–61. SciPy.
  23. Sulaiman, M. A., and J.Labadin. 2015. “Feature Selection Based on Mutual Information.” In Proceedings of the 2015 9th International Conference on IT in Asia (CITA)1–6. IEEE.
  24. Tahiru, I. T., O.Olagundoye, and A. O.Alabere. 2022. “Machine Learning for Sonic Logs Prediction: A Case Study from the Niger Delta Basin in the Gulf of Guinea.” In Proceedings of the International Petroleum Technology Conference, D012S122R004. IPTC.
  25. Venkatesh, B., and J.Anuradha. 2019. “A Review of Feature Selection and Its Methods.” Cybernetics and Information Technologies19, no. 1: 3–26.
    [Google Scholar]
  26. Virtanen, P., R.Gommers, T. E.Oliphant, et al. 2020. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.” Nature Methods17, no. 3: 261–272.
    [Google Scholar]
  27. Vishal, V., D.Lall, and Y.Verma, et al. 2025. “Assessment of CO2 Storage Potential of a Saline Aquifer in the Gandhar Field, Cambay Basin, India.” Marine and Petroleum Geology180, no. 4: 107476.
    [Google Scholar]
  28. Vishal, V., S.Roy, Y.Verma, and B.Shekar. 2024. “Assessing the Viability of Gandhar Field in India's Cambay Basin for CO2 Storage.” Journal of Marine Science and Application23, no. 3: 529–543.
    [Google Scholar]
  29. Wang, G., F.Lauri, and A. H.El Hassani. 2022. “Feature Selection by mRMR Method for Heart Disease Diagnosis.” IEEE Access10: 100786–100796.
    [Google Scholar]
  30. Wang, Y., X.Li, and R.Ruiz. 2022. “Feature Selection with Maximal Relevance and Minimal Supervised Redundancy.” IEEE Transactions on Cybernetics53, no. 2: 707–717.
    [Google Scholar]
  31. Yu, Y., C.Xu, S.Misra, et al. 2021. “Synthetic Sonic Log Generation with Machine Learning: A Contest Summary from Five Methods.” Petrophysics62, no. 04: 393–406.
    [Google Scholar]
  32. Zebari, R., A.Abdulazeez, D.Zeebaree, D.Zebari, and J.Saeed. 2020. “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction.” Journal of Applied Science and Technology Trends1, no. 1: 56–70.
    [Google Scholar]
/content/journals/10.1111/1365-2478.70095
Loading
/content/journals/10.1111/1365-2478.70095
Loading

Data & Media loading...

Most Cited This Month Most Cited RSS feed

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error