1887

Abstract

Summary

Fossils are crucial in geology, providing evidence for stratigraphic correlation and paleoenvironmental reconstruction. However, traditional fossil records often suffer from disconnection between specimen details and collection points, limiting their scientific utility. Large amounts of valuable fossil data remain locked in legacy reports, handwritten notes, and scanned charts. To address this, a novel AI-driven framework integrates Large Vision Models (LVMs), Large Language Models (LLMs), and GIS to extract and process fossil data from over 7,000 legacy documents. The methodology employs YOLO-based object detection to identify key terms, such as formation names and faunal data, followed by Handwritten Text Recognition (HTR) and LLM-based refinement. This approach achieved significant results, resolving 2,410 fossil localities across Saudi Arabia. Quality control measures, including OCR error correction and validation against benchmarks, ensured accuracy, with a Word Error Rate of just 5%. Further, LVMs like Qwen2-VL and TrOCR enabled extraction from both handwritten and printed records, while duplication analysis reduced redundancy. A major breakthrough was achieved through AI-assisted geographic reasoning, which integrated textual descriptions with quadrangle maps, expanding locality data by 1,474 points. Ultimately, 98.5% of localities were geographically verified, demonstrating the framework’s potential to transform inaccessible fossil archives into structured, standardized datasets for future scientific research.

Loading

Article metrics loading...

/content/papers/10.3997/2214-4609.202639011
2026-03-09
2026-02-15
Loading full text...

Full text loading...

References

  1. Allmon, W. D., Dietl, G.P., Hendricks, J.R., and Ross, R.M., 2018, Bridging the two fossil records: Paleontology’s "big data" future resides in museum collections, in Rosenberg, G.D., and Clary, R.M., eds., Museums at the Forefront of the History and Philosophy of Geology: History Made, History in the Making: Geological Society of America Special Paper 535, p. 1–10, https://doi.org/10.1130/2018.2535(03)
    [Google Scholar]
  2. AshishVaswani, NoamShazeer, NikiParmar, JakobUszkoreit, LlionJones, Aidan N.Gomez, LukaszKaiser, IlliaPolosukhin, 2017. “Attention is All you Need”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, https://doi.org/10.48550/arXiv.1706.03762
    [Google Scholar]
  3. J.Redmon, S.Divvala, R.Girshick and A.Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
    https://doi.org/10.1109/CVPR.2016.91 [Google Scholar]
  4. PengWang et al., 2024. Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution. arXiv: https://doi.org/10.48550/arXiv.2409.12191
    [Google Scholar]
  5. MinghaoLi, TengchaoLv, JingyeChen, LeiCui, YijuanLu, DineiFlorencio, ChaZhang, ZhoujunLi, FuruWei., 2022. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. https://doi.org/10.48550/arXiv.2109.10282
    [Google Scholar]
  6. GaspardMerten, GillesDejaegere, MahmoudSakr, 2025. GeoPandas-AI: A Smart Class Bringing LLM as Stateful AI Code Assistant. https://doi.org/10.48550/arXiv.2506.11781
    [Google Scholar]
  7. Baucon, A., de Carvalho, C.N., 2024. Can AI Get a Degree in Geoscience? Performance Analysis of a GPT-Based Artificial Intelligence System Trained for Earth Science (GeologyOracle). Geoheritage16, 121. https://doi.org/10.1007/s12371-024-01011-2
    [Google Scholar]
  8. YifanZhang, ChengWei, ShangyouWu, ZhengtingHe, WenhaoYu, 2023. GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT. https://doi.org/10.48550/arXiv.2307.07930
    [Google Scholar]
/content/papers/10.3997/2214-4609.202639011
Loading
/content/papers/10.3997/2214-4609.202639011
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error