-
f Performing Successful Data Science in the Geoscience Domain
- Publisher: European Association of Geoscientists & Engineers
- Source: Conference Proceedings, 79th EAGE Conference and Exhibition 2017 - Workshops, Jun 2017, cp-519-00002
- ISBN: 978-94-6282-219-1
Abstract
To perform data science with scientific data, we must represent the scientific problem space to allow analytics. This requires a blend of traditional physics-based algorithms with modern advanced analytics, performed on datasets large enough to yield statistically robust insights. These exposed insights in the data must be explained by scientists, driving creative thinking, in contrast to application-driven workflows where line-of-sight to original data is typically absent. We show how an open approach to data parsing, storage and integration drives better understanding of data, and moreover, enables the deployment and development of open source tools for processing, analysing and visualising data and insights. Dealing with measurement data brings challenges of quality, sparsity and irregular sampling, in datasets that must be integrated in the spatial, time and frequency domains. This is time-consuming work, often taking 80% of the time of each analytical study, and so we recommend that data from the geoscience domain should be curated in a “load once, use many times” paradigm. Higher-level parameters can then be created to capture the scientific insights of multi-physics systems for use in one-off or operationalised descriptions of a system. After implementing this level of abstraction, the geoscientific world is ready for data science.