Full text loading...
Advancements in multimodal document mining are transforming geoscientific information extraction from heterogeneous sources such as text, tables, and images. This work investigates Retrieval-Augmented Generation (RAG) and GraphRAG approaches, combined with MinerU, a preprocessing tool that preserves document hierarchy and reduces noise, to enhance retrieval and synthesis of geoscientific knowledge.
The methods were evaluated on the Norway Relinquishment Licenses dataset, comprising 781 diverse documents. Five configurations of RAG and GraphRAG were benchmarked on 67 technical questions using metrics of completeness, correctness, verbosity, and response time. Results show that RAG with a 5000-token chunk size achieved the best combined score (0.498), outperforming smaller RAG chunks and GraphRAG Global Search, while GraphRAG Local Search performed better than its global variant. Limitations remain in handling complex tables and image-rich content, highlighting the importance of preprocessing quality.
These findings underline the potential of RAG and GraphRAG as assistive tools for geoscientific analysis, while human supervision remains necessary for critical data. Future work will focus on enhancing GraphRAG architectures, extending multimodal reasoning capabilities, and automating workflows to enable more robust, interpretable, and autonomous document mining in geosciences.