FARR Mini-Research Project Trains AI Model to Read Geological Maps

lbschreiber
Sep 26, 2025
3 min read

Written by Jack Imel, University of Alaska Southeast Intern at SDSC

University of Idaho researchers fine-tune EfficientNetB0 for geological object recognition in legacy maps through data augmentation and transfer learning techniques

Thanks to funding from the U.S. National Science Foundation (NSF) FARR (FAIR in ML, AI Readiness and Reproducibility) Research Coordination Network, a University of Idaho research team leveraged data augmentation and transfer learning to improve an AI model’s (EfficientNetB0) recognition of geological objects — including rock types, faults, folds, and stratigraphic units — in maps. Wenjia Li, Weilin Chen, Jiyin Zhang, Chenhao Li and Xiaogang Ma worked to provide more precise recognition and classification of geological objects within an array of maps.

“Maps are crucial tools in geosciences, providing detailed representations of the spatial distribution and relationships among geological features,” said W. Li. “Accurate recognition and classification of geological objects within these maps are essential for applications in resource exploration, environmental management, and geological hazard assessment.”

After more than a century of cartography and a variety of geological contexts for mapping, the notation of geological objects has varied widely. While manually identifying geological objects in maps of varying quality, context, and era is time-consuming and labor-intensive, the team demonstrated how AI can be used to accomplish the task more efficiently. But, their work also showed that datasets for training the AI are limited by inconsistent and incomplete annotations in legacy maps, which in turn limit the object recognition capability of the AI. To remedy this issue and achieve more robust recognition, the researchers used data augmentation.

“Data augmentation is a technique wherein synthetic data is used to supplement a dataset that is small or lacking in diversity,” Ma said. “By augmenting existing datasets with synthetic data, we can create larger and more diverse training sets, improving the performance and robustness of machine learning models.”

To build a synthetically augmented dataset, the researchers first preprocessed existing maps, using OCR (optical character recognition) and OpenCV to remove text annotations, and DEM (Digital Elevation Model) to overlay topographic data. While the removal of text annotations stopped the AI from identifying geological objects from text clues, the added topographic data added a layer of realism to the augmented dataset.

The team demonstrated that simulating the intricate interaction of topography and geological features increases the authenticity and relevance of the training dataset, making the results more useful for both academic research and practical geological tasks. Next, the image was rotated by three randomly selected degrees between zero and 180 in order to train the AI model to recognize symbols regardless of their orientation. Then, geological line features such as faults, bedding planes, and joints were randomly generated to supplement the limited number of such features in many geological maps. Finally, image noise was added to simulate the effects of aging and scanning errors on geological maps.

Ultimately, the data augmentation expanded the original dataset of 2130 images by over ten times, resulting in a dataset of 29,820 images. While this more comprehensive dataset contributed to the model’s accurate performance, the next step was to train the model using transfer learning — a powerful method that can reuse a pre-trained model as a starting point for classification models.

“In our case, transfer learning allowed us to fine-tune model parameters on our target data,” Ma said. “Our study utilized legend data to train the pre-trained EfficientNetB0 model. Then, we added a fully connected neural network layer to fine-tune the model for geological map processing, which ensured specific adaptation for accurate recognition and classification of geological features.”

In other words, EfficientNetB0 is an AI model that was pre-trained to identify a variety of object classes, and in this study, it was adapted to recognize geological features by first training on map legends. Then, a fully connected neural network layer was incorporated to classify the features into specific categories based on individual map legend symbols and annotations.

With the use of transfer learning and data augmentation, the AI model was able to identify geographical objects with up to 70% accuracy.

“This study presents an innovative workflow to enhance geological object recognition by leveraging legend data for data augmentation and utilizing the EfficientNetB0 deep learning model,” said FARR-RCN Principal Investigator Christine Kirkpatrick, director of the San Diego Supercomputer Center’s Research Data Services Division. “In short, the team emphasized the significance of data augmentation in improving model generalization across different geological contexts, leading to better recognition and classification of geological objects, which was one of the goals of our mini research grant program.”

The study was published in the Applied Computing and Geosciences journal.

FARR Mini-Research Project Trains AI Model to Read Geological Maps

Recent Posts

Comments

Join our mailing list for updates on activities and events