1887

Abstract

Summary

In the field of natural language processing, word embeddings are a set of techniques that transform words from an input corpus into a low-dimensional space with the aim of capturing the relationships between words. It is well known that such relations are highly dependent on the context of the input corpus, which in science varies highly from field to field. In this work we compare the performance of word embeddings pre-trained on generic text versus custom made word embeddings trained on an extensive corpus of geoscientific papers. Numerous examples highlight the difference in meaning and closeness of words betweeen geoscientific and generic context. A prime example is the term ghost which has a specific definition in geophysics, different to that of its common usage in the English language. Moreover, domain specific analogies, such as ‘Compressional is to P-wave what shear is to… S-wave’, are investigated to understand the extent to which the different word embeddings capture the relationship between terms. Finally, we anticipate some use cases of word embeddings aimed at extracting key information from documents and providing better indexing.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202032059
2020-11-30
2024-04-28
Loading full text...

Full text loading...

References

  1. Allen, C. and Hospedales, T.
    [2019] Analogies Explained: Towards Understanding Word Embeddings. arXiv preprint arXiv:1901.09813.
    [Google Scholar]
  2. Birnie, C.E., Sampson, J., Sjaastad, E., Johansen, B., Obrestad, L.E., Larsen, R. and Khamassi, A.
    [2019] Improving the Quality and Efficiency of Operational Planning and Risk Management with ML and NLP. In: SPE Offshore Europe Conference and Exhibition. Society of Petroleum Engineers.
    [Google Scholar]
  3. Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T.
    [2017] Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
    [Google Scholar]
  4. Maaten, L.v.d. and Hinton, G.
    [2008] Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579–2605.
    [Google Scholar]
  5. Mikolov, T., Chen, K., Corrado, G. and Dean, J.
    [2013] Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    [Google Scholar]
  6. Ozsoy, M.G.
    [2016] From word embeddings to item recommendation. arXiv preprint arXiv:1601.01356.
    [Google Scholar]
  7. Pennington, J., Socher, R. and Manning, C.
    [2014] Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
    [Google Scholar]
  8. Turney, P.D. and Pantel, P.
    [2010] From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141–188.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/papers/10.3997/2214-4609.202032059
Loading
/content/papers/10.3997/2214-4609.202032059
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error