Constructing a Structured Corpus from Geoscience Literature: A Case Study using Western Australia Iron and Lithium Deposits
Main Article Content
Abstract
The field of geoscience is facing challenges when it comes to combining various data sets and formats to facilitate knowledge discovery. Academic literature often contains unstructured data, which require intricate processing for machine comprehension. Information Extraction can plays a vital role in effectively organizing these data by enhancing the functionality of artificial intelligence algorithms. Natural Language Processing (NLP) and Named Entity Recognition (NER), particularly using ontologies, have been pivotal in achieving semantic consistency within integrated datasets. Advances in geospatial artificial intelligence, propelled by deep learning and machine learning, have been crucial in converting vast amounts of spatial data into practical knowledge. However, the reliance on extensively labelled datasets imposes limitations due to the intensive nature of the process. This paper showcases a case study demonstrating the practical application of these methods to journal articles on mineral deposits in Western Australia. The results highlight the efficacy of our approach in transforming unstructured data into structured information, thereby contributing to the advancement of knowledge extraction in geoscience. This study not only underscores the importance of robust data processing strategies but also shows how advanced machine learning and deep learning techniques in geospatial artificial intelligence, enable more informed geological research and decision-making.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Reusers are allowed to copy, distribute, and display or perform the material in public. Adaptations may be made and distributed.