1887

Abstract

Summary

Geoscience documents pose challenges for automated information extraction due to complex layouts, poor visual quality, and diverse table structures. Traditional workflows rely on separate models for layout, text, and table extraction, leading to error propagation and inconsistent outputs. Large language models lack domain-specific understanding, limiting their effectiveness. To address these issues, we fine-tune a lightweight vision-language model integrating visual and textual understanding. Our model jointly processes text, tables, and figures in a schema-aware manner, maintaining natural reading order and contextual coherence. This unified approach reduces maintenance complexity, minimizes errors, and improves consistency across geoscience document elements.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202639040
2026-03-09
2026-02-13
Loading full text...

Full text loading...

References

  1. Castro, D., Paquet, A., and Besançon, R. [2022] Doctr: Document Text Recognition. Available at: https://mindee.github.io/doctr/ (Accessed: 13 October 2025).
    [Google Scholar]
  2. Dong, T., Clarke, F., and Hou, S., [2025], Enhanced workflow for processing and gaining insights from tabular data. EAGE Conference Proceedings, 2025(1), 1–4.
    [Google Scholar]
  3. Feng, H., Wei, S., Fei, X., Shi, W., Han, Y., Liao, L., Lu, J., Wu, B., Liu, Q., Lin, C., Tang, J., Liu, H. & Huang, C. [2025] Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting. arXiv preprint, arXiv:2505.14059
    [Google Scholar]
  4. Hou, S., Dong, T., Sancheti, O., and Liu, H. [2025], Advancing geologic document digitalization and information retrieval with generative AI. The Leading Edge.
    [Google Scholar]
  5. Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C. and Wei, F. [2022] DiT: Self-supervised pre-training for Document Image Transformer. arXiv preprint, arXiv:2203.02378.
    [Google Scholar]
  6. Lun, C.H., Hewitt, T., and Hou, S., [2022], A machine learning pipeline for document extraction: First Break, 40(2), 73–78.
    [Google Scholar]
  7. Smock, B., Pesala, R., and Abraham, R. [2022] PubTables-1M: Towards comprehensive table extraction from unstructured documents. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4634–4642.
    [Google Scholar]
/content/papers/10.3997/2214-4609.202639040
Loading
/content/papers/10.3997/2214-4609.202639040
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error