1887
25th International Conference and Exhibition – Interpreting the Past, Discovering the Future
  • ISSN: 2202-0586
  • E-ISSN:

Abstract

Big Data techniques have the potential to be paradigm-changing for applied geoscience if they are used widely. A significant number of such techniques, under the umbrella of Earth informatics, involve Machine Learning applied to high dimensional data to create new forms of value. This contribution presents two case studies of successful Earth informatics computation and the communication of the value of results, which provide insight into the uptake of ‘Big Data’ in geosciences.

Machine Learning techniques split naturally into either supervised or unsupervised approaches. Supervised algorithms, such as Random Forests (RF), support vector machines or neural networks, share the concept of training a classifier using an initial (training) dataset. They are generally applied to predictive tasks, such as our first case study, predicting lithology from remote sensing and airborne geophysical data. Unsupervised algorithms, such as Self-Organising Maps (SOM), allow patterns inherent in the data to emerge without the use of a training dataset. They are generally applied to tasks which seek to explore patterns in data, such as our second case study, which identifies new potentially prospective river catchments. We find that calculating and presenting explicitly the newly extracted value, of the result obtained through computation, is an essential component of the post-compute evaluation.

As strong advocates for the use of a range of Big Data techniques in applied geosciences, we conclude that the benefits to be gained from the way that we ‘compute’ can be lost if we do not also take considerable care with the ways that we ‘communicate’.

Loading

Article metrics loading...

/content/journals/10.1071/ASEG2016ab181
2016-12-01
2026-01-18
Loading full text...

Full text loading...

References

  1. Breiman, L., 2001. Random forests, Machine Learning, 45, 5-32.
  2. Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J., 1984. Classification and Regression Trees, The Wadsworths & Brooks/Cole Statistics/Probability Series, Pacific Grove, USA.
  3. Cracknell, M.J. & de Caritat, P., 2016. Catchment-scale gold prospectivity analysis from the National Geochemical Survey of Australia, 26th Goldschmidt Conference, Yokohama, Japan, 26 June - 1 July.
  4. Cracknell, M.J. & Reading, A.M., 2014. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information, Computers & Geosciences, 63, 22-33.
  5. Cracknell, M.J., Reading, A.M. & de Caritat, P., 2015. Multiple influences on regolith characteristics from continental-scale geophysical and mineralogical remote sensing data using Self-Organizing Maps, Remote Sensing of Environment, 165, 86-99.
  6. Cracknell, M.J., Reading, A.M. & McNeill, A.W., 2014. Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer-Mt Charter region, Tasmania, using Random Forests (TM) and Self-Organising Maps, Australian Journal of Earth Sciences, 61, 287-304.
  7. Davies, D.L. & Bouldin, D.W., 1979. A cluster separation measure, IEEE Tranactions on Pattern Analysis and Machine Intelligence PAMI-1, 224-227.
  8. de Caritat, P. & Cooper, M., 2011. National Geochemical Survey of Australia: The Geochemical Atlas of Australia, GA Record 2011/20, Geoscience Australia, Canberra, ACT, Australia.
  9. Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M. & Zupan, B., 2013. Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research, 14, 2349-2353.
  10. Hastie, T., Tibshirani, R. & Friedman, J.H., 2009. The elements of statistical learning: data mining, inference and prediction, 2nd Edition, Series in Statistics. Springer, New York, USA.
  11. Hill, E.J., Robertson, J. & Uvarova, Y., 2015. Multiscale hierarchical domaining and compression of drill hole data, Computers & Geosciences, 79, 47-57.
  12. Kuhn, S., Cracknell, M.J. & Reading, A.M., 2016. Lithological Mapping via Random Forests: Information Entropy as a Proxy for Inaccuracy, ASEG Extended Abstracts, 25th International Geophysical Conference and Exhibition, 21-24 August, Adelaide, Australia, 1-4.
  13. Mayer-Schonberger, V. & Cukier, K., 2013. Big Data: A Revolution That Will Transform How We Live, Work and Think, John Murray (Publishers), UK.
  14. Merdith, A.S., Landgrebe, T.C.W. & Muller, R.D., 2015. Prospectivity of Western Australian iron ore from geophysical data using a reject option classifier, Ore Geology Reviews, 71, 761-776.
  15. R Core Team, 2015. R: A Language and Environment for Statistical Computing.
  16. Reading, A.M., Cracknell, M.J., Bombardieri, D.J. & Chalke, T., 2015. Combining Machine Learning and Geophysical Inversion for Applied Geophysics, ASEG Extended Abstracts, 24th International Geophysical Conference and Exhibition, 15-18 February, Perth, Australia, 1-4.
  17. Reading, A.M. & Gallagher, K., 2013. Transdimensional change-point modeling as a tool to investigate uncertainty in applied geophysical inference: An example using borehole geophysical logs, Geophysics, 78, WB89-WB99.
  18. Siponen, M., Vesanto, J., Simula, O. & Vasara, P., 2001. An approach to automated interpretation of SOM, in: Allinson, N., Yin, H, Allinson, L., Slack, J. (Eds), Advances in Self-Organising Maps, Springer London, 89-94.
  19. Wehrens, R. & Buydens, L.M.C., 2007. Self- and super-organizing maps in R: The kohonen package, Journal of Statistical Software, 21, 1-19.
/content/journals/10.1071/ASEG2016ab181
Loading
  • Article Type: Research Article
Keyword(s): Big Data; Communication; High Dimensional; Machine Learning; Supervised; Unsupervised
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error