To perform data science with scientific data, we must represent the scientific problem space to allow analytics. This requires a blend of traditional physics-based algorithms with modern advanced analytics, performed on datasets large enough to yield statistically robust insights. These exposed insights in the data must be explained by scientists, driving creative thinking, in contrast to application-driven workflows where line-of-sight to original data is typically absent. We show how an open approach to data parsing, storage and integration drives better understanding of data, and moreover, enables the deployment and development of open source tools for processing, analysing and visualising data and insights. Dealing with measurement data brings challenges of quality, sparsity and irregular sampling, in datasets that must be integrated in the spatial, time and frequency domains. This is time-consuming work, often taking 80% of the time of each analytical study, and so we recommend that data from the geoscience domain should be curated in a “load once, use many times” paradigm. Higher-level parameters can then be created to capture the scientific insights of multi-physics systems for use in one-off or operationalised descriptions of a system. After implementing this level of abstraction, the geoscientific world is ready for data science.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error