top of page

AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies - A Workshop Summary

  • lbschreiber
  • Jul 10, 2024
  • 3 min read

Written by Jane Greenberg, Drexel University


AI-ready data, which refers to high-quality and well-prepared data that is optimized for use in artificial intelligence (AI) applications, increasingly encompasses the inclusion of metadata and ontologies to enhance value and usability. While metadata provides essential context and information about data, ontologies offer structured semantic representation of a particular domain. These additional layers of information help data scientists, researchers, and AI systems understand, interpret, and apply appropriate algorithms and models for analysis. Metadata and ontologies enable consistent data integration, interoperability, and knowledge sharing across systems - while facilitating more knowledgeable AI applications. Additionally, these systems are proving vital for supporting the FAIR (Findable, Accessible, Interoperable, and Reusable) Principles and reproducible computational research.


Despite these capacities, approaches for developing, implementing, and sustaining metadata and ontologies within AI-ready data pipelines remain inconsistent, cumbersome, and lack sufficient support. Challenges underlie the full data lifecycle from data creation, collection, and research, to longer-term aims of data preservation, archiving, reuse and support for research reproducibility. Collective, community driven efforts are needed to address current obstacles and maximize the value and reliability of data. The AI-Ready Data: Navigating the Dynamic Frontier of Metadata and Ontologies two-day workshop was held in April at Drexel University and served as a viable step toward addressing this challenge. 


Sponsored by the Institute for Data-Driven Dynamical Research, the workshop was hosted by the Metadata Research Center at Drexel and brought together more than 50 individuals with expertise across the data lifecycle to discuss issues, share solutions, and chart a path forward for addressing key challenges in preparing AI-ready data for scientific research. Participants were from the five NSF-HDR institutes, as well as other NSF initiatives (e.g., Big Data Hubs, Open Knowledge Network, FAIR OS RCNs, Research Data Alliance), industry, federal agencies (National Institute of Standards and Technology, NIST) and two U.S. National laboratories (Oak Ridge National Laboratory, and Pacific Northwest National Laboratory. 


Workshop participants shared case studies, methods, and goals for incorporating metadata and ontologies into AI-ready data frameworks. They also gathered in a series of breakout groups to discuss AI-ready data approaches, needs, and opportunities interconnecting with metadata and ontologies. 


Christine Kirkpatrick, PI of the FAIR in ML, AI Readiness, & Reproducibility Research Coordination Network (FARR RCN), presented at the Data Management, FAIR practices, and Prepping for AI-Ready Pipelines session.

 

Some key takeaways from the overall workshop include the following:

  • Incorporating metadata and ontologies into an AI-ready data framework may prove crucial for accelerating knowledge discovery and supporting longitudinal science.

  • AI methods, including generative AI, can leverage metadata and ontologies to improve and validate AI-ready data.

  • FAIR data is a component of AI ready data, although it is important to recognize that not all FAIR data is AI-ready.

  • Use cases can guide in determining the level of AI readiness necessary for data.

  • Metadata and ontology informed AI-ready data techniques developed for domain specific data are applicable across other domains.

  • More attention needs to be given to the AI-ready data spectrum, including DevOps training, AI readiness levels, and stakeholder engagement. 

  • Training: Introducing undergraduates to metadata/data practices has the potential to have tremendous impact on prepping AI ready data and improving overall data representations long term.

  • AI-readiness levels: Progress codifying AI-readiness levels (e.g.,ESIP; iHARP work presented by Sanjay Purushotham, UMBC; and exiting machine learning schemes) can further inform work on leveraging metadata and ontological systems into AI-ready data pipelines.


On the closing date, when breakout groups presented their final reports, participants wanted to keep working on their projects - many of them staying afterwards and asking if a follow-up workshop would be soon occuring. Although no plans are set for a follow-up workshop, the conversation on Fully AI Ready Data will continue at the in-person 2024 FAIR in ML, AI Readiness & Reproducibility Workshop at the AGU Conference Center in Washington, D.C. on October 9-10, 2024.


ree

8 Comments



t ht
t ht
Sep 06

The article astutely highlights how developing, implementing, and sustaining metadata and ontologies for AI-ready data pipelines remains a significant hurdle. This inconsistency often leads to fragmented data understanding, making it difficult for AI systems to truly leverage the 'knowledge sharing' potential described. While the discussion covers metadata broadly, a crucial aspect often overlooked is the granular extraction and standardization of embedded metadata, such as that found within image or document files, which can be vital for specific AI applications. Understanding and effectively utilizing this embedded information is key, and resources that simplify the process of reading and interpreting EXIF data can be incredibly valuable for ensuring data readiness.

Like

bat newbrown
bat newbrown
Sep 05

Jane, your point about the inconsistent and cumbersome approaches for developing and sustaining metadata and ontologies within AI-ready data pipelines really resonates. It's a critical bottleneck, especially when dealing with diverse data sources where semantic alignment is paramount for reliable AI model training and interpretation. Without robust, standardized frameworks, the promise of truly 'knowledgeable AI applications' remains elusive. This challenge becomes particularly acute when integrating specific, often overlooked, but highly valuable metadata like precise location data, which can significantly enrich AI applications in fields ranging from environmental monitoring to urban planning. Exploring solutions that streamline the extraction and standardization of such information is key to overcoming these hurdles and unlocking the full potential of AI-powered photo location intelligence.

Like

ac ab
ac ab
Aug 29

Spot on! Inconsistent metadata/ontology approaches hinder AI's full potential. Community efforts are vital to standardize this, much like precise contextual data, which Photolocation champions. Essential reading.

Like

cup cun
cup cun
Jul 17

If you're curious to Unlock detailed image data from your photographs, an online EXIF reader is the way to go. This metadata can reveal a wealth of information beyond just what you see visually. You can discover the exact date and time the photo was taken, the specific camera and lens used, exposure settings, whether the flash fired, and much more. It’s a valuable tool for photographers who want to meticulously track their settings or for anyone interested in the technical aspects behind a photograph. It's like a digital fingerprint for your images.

Like

Join our mailing list for updates on activities and events

SDSClogo-plusname-red.jpeg
ncstate-type-2x2-red.png
ncsa-logo.png
nsf logo.jpg

This work is supported through the National Science Foundation award # 2226453.

bottom of page