Full text loading...
Unlocking the potential of historical Gas Chromatography-Mass Spectrometry (GC-MS) datasets for AI-driven insights has remained a challenge due to decades of variability in data acquisition. This study presents a transformative Al-based methodology to globally align 18,000 GC-MS chromatograms of from Petrobras oil samples, creating standardized, AI-ready geochemical dataset.
The methodology introduces a novel data-driven reference framework to guide chromatographic alignment, integrating classical processes such as baseline correction, peak extraction and normalization with a convolutional model. This approach resolves inconsistencies while retaining critical geochemical features, ensuring datasets are consistently aligned across diverse samples. Validation through explained variance and UMAP projections demonstrates significant improvements in peak consistency and sample differentiation, particularly for petroleum origin, maturity, and depositional environment classification.
By addressing challenges in chromatographic alignment, this framework bridges traditional geochemical analysis and modern AI applications. It generalizes to other chromatographic techniques, providing a scalable solution for preparing diverse datasets for machine learning while maintaining the integrity of essential features.