1887

Abstract

Summary

Unlocking the potential of historical Gas Chromatography-Mass Spectrometry (GC-MS) datasets for AI-driven insights has remained a challenge due to decades of variability in data acquisition. This study presents a transformative Al-based methodology to globally align 18,000 GC-MS chromatograms of from Petrobras oil samples, creating standardized, AI-ready geochemical dataset.

The methodology introduces a novel data-driven reference framework to guide chromatographic alignment, integrating classical processes such as baseline correction, peak extraction and normalization with a convolutional model. This approach resolves inconsistencies while retaining critical geochemical features, ensuring datasets are consistently aligned across diverse samples. Validation through explained variance and UMAP projections demonstrates significant improvements in peak consistency and sample differentiation, particularly for petroleum origin, maturity, and depositional environment classification.

By addressing challenges in chromatographic alignment, this framework bridges traditional geochemical analysis and modern AI applications. It generalizes to other chromatographic techniques, providing a scalable solution for preparing diverse datasets for machine learning while maintaining the integrity of essential features.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202539031
2025-03-24
2026-02-16
Loading full text...

Full text loading...

References

  1. Baek, S.J., Park, A., Ahn, Y.J. and Choo, J. [2015] Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 140, 250–257.
    [Google Scholar]
  2. Dixon, S.J., Brereton, R.G., Soini, H.A., Novotny, M.V. and Penn, D.J. [2006] An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets. Journal of Chemometrics, 20(7–8), 325–340.
    [Google Scholar]
  3. Jiang, W., Zhang, Z.M., Yun, Y.H., Zhan, D.J., Zheng, Y.B., Liang, Y.Z., Yang, Z. and Yu, L. [2013] Comparisons of Five Algorithms for Chromatogram Alignment. Chromatographia, 76.
    [Google Scholar]
  4. Li, M. and Wang, X.R. [2019] Peak alignment of gas chromatography-mass spectrometry data with deep learning. Journal of Chromatography A, 1604, 460476.
    [Google Scholar]
  5. McInnes, L., Healy, J. and Melville, J. [2020] UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
    [Google Scholar]
  6. Nielsen, N.P.V., Carstensen, J.M. and Smedsgaard, J. [1998] Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimized warping. Journal of Chromatography A, 805(1), 17–35.
    [Google Scholar]
  7. Niezen, L.E., Schoenmakers, P.J. and Pirok, B.W.J. [2022] Critical comparison of background correction algorithms used in chromatography. Analytica chimica acta, 1201, 339605.
    [Google Scholar]
  8. Noonan, M., Tinnesand, V. and Buesching, C. [2018] Normalizing Gas-Chromatography-Mass Spectrometry Data: Method Choice can Alter Biological Inference. BioEssays, 40.
    [Google Scholar]
  9. Roman-Hubers, A.T., Cordova, A.C., Barrow, M.P. and Rusyn, I. [2023] Analytical chemistry solutions to hazard evaluation of petroleum refining products. Regulatory Toxicology and Pharmacology, 137, 105310.
    [Google Scholar]
  10. Ronneberger, O., Fischer, P. and Brox, T. [2015] U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18. Springer, 234–241.
    [Google Scholar]
  11. Skov, T., van den Berg, F., Tomasi, G. and Bro, R. [2007] Automated alignment of chromatographic data. Journal of Chemometrics, 20(11–12), 484–497.
    [Google Scholar]
  12. Tomasi, G., van den Berg, F. and Andersson, C. [2004] Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. Journal of Chemometrics, 18(5), 231–241.
    [Google Scholar]
  13. Wang, C.P. and Isenhour, T.L. [1987] Time-warping algorithm applied to chromatographic peak matching gas-chromatography Fourier-transform infrared mass-spectrometry. Analytical Chemistry, 59, 649–654.
    [Google Scholar]
/content/papers/10.3997/2214-4609.202539031
Loading
/content/papers/10.3997/2214-4609.202539031
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error