AIDR 2019 Call for Submissions

Supported by the NSF scientific data reuse initiative, AIDR 2019: Artificial Intelligence for Data Discovery and Reuse is a conference aiming to find innovative solutions to accelerate the dissemination and reuse of scientific data in the data revolution. The explosion in the volume of scientific data has made it increasingly challenging to find data scattered across various platforms. At the same time, increasing numbers of new data formats, greater data complexity, a lack of consistent data standards across disciplines, metadata or links between data and publications makes it even more challenging to evaluate data quality, reproduce results, and reuse data for new discoveries. AIDR 2019 provides a platform for AI/ML researchers, data professionals, and scientists to come together and benefit from mutual expertise to address these data challenges and to facilitate the next breakthroughs in science and technology using the power of AI and scientific data. Quality submissions from academia, industry and other communities are all welcomed. 

Important Dates

(All deadlines are 11:59pm EST)

Submission deadline: February 1, 2019 February 22, 2019

Author notification: February 28, 2019 March 8, 2019

Early bird registration deadline: March 15, 2019 March 22, 2019

Conference: May 13-15, 2019

Submission Guidelines

The conference program committee invites Abstract submissions for presentations and panels that address applications of AI/ML to challenges related to the discovery, reuse and management of data across disciplinary domains. Innovative algorithms, tools and platforms, successful use cases, and biomedical applications are of great interest. Submissions can be entered as plain text in the web form via EasyChair (you will need to have or create an EasyChair account to proceed to the submission):

We are accepting Abstract submissions (max. 400 words) for the following formats:

  • Long Talks: Authors of accepted abstracts will be invited to present at one of the plenary sessions. The content of the talk should focus on original research and finished work. Talks will range in length from 30-45 min.
  • Short Talks: Authors of accepted abstracts will be invited to present at one of the lighting or brief talk sessions. Submission of relevant work in progress is encouraged for short talks. Talks will range in length from 10-20 min.
  • Posters: Authors of accepted abstracts will be invited to present during a poster session.
  • Panel Discussions: Submissions for panel discussions should include background and significance of the topic being discussed, and a (preliminary) list of panelists. Panel Discussions should focus on topics that engage different communities and stimulate discussions on emerging topics, technical trends, and pressing issues.

Student Competitions: All talks and posters for which an undergraduate or graduate student is a lead author will be included in competitions for the best long talk, best short talk, and best poster awards, evaluated by reviewers identified by the program committee.

List of Topics

Topics that the conference will address include but are not limited to:

  • Automating data discovery: Going beyond simple search, how can AI help find data that are described in different ways, languages, and formats?
  • Automating data curation and metadata generation: How can AI provide more robust and more precise tools for provenance, data-driven metadata, (machine-readable) documentation, and curation?
  • Measuring and improving data quality: How can AI help to assess data quality, provide recommendations for improving data quality and develop tools for doing so, and to measure data quality for consideration as it is being reused?
  • Integrating datasets: As relevant data are discovered, how can AI help with their fusion, including factors such as ontology, format, units, and language?
  • Enabling interpretability: How can AI contribute to the representation of information that is both machine readable and human readable?
  • Measuring data metrics and citation: How are data citations tracked and linked to publications? How is the impact of research data evaluated?
  • Data privacy, security and algorithmic bias: what are their ethical implications and how can they be avoided?
  • Collaborating across disciplinary and expertise domains: How do professionals from different domains work together?

Areas with great potential to address these challenges include but are not limited to:

  • Natural language processing to aid in the discovery of data and its interpretation, information extraction, and generation of ontology, taxonomy and the knowledge base
  • Inference of data types and where they fit into ontologies, and automatically creating more precise, machine-readable metadata from that information
  • Measuring, reporting, and improving data quality through identification and potential cleaning of missing or possibly erroneous values
  • Inference-based conversion of formats, ranging from simple cases such as unit conversions to more complex cases such as working with data represented using different geographical frames
  • Tools for visually representing data quickly and intuitively, to help with understanding unfamiliar datasets and the results of analytics
  • Human-in-the-loop methods applied to all of the above for semi-supervised training, potentially leading to greater degrees of autonomy

Abstract Review Process

Abstracts and proposals will be reviewed by the program committee, which consists of experts in the areas of AI, scientific computing, computational biology, research data management, and information science and technology. Contributions will be selected based on quality and relevance. Notification of acceptance will be made by February 28, 2019.

Committee Members

General Chair:

Huajin Wang, Carnegie Mellon University

Program Co-chairs:

Keith Webster, Carnegie Mellon University
Nick Nystrom, Pittsburgh Super Computing Center
Huajin Wang, Carnegie Mellon University

Program Committee:

Paola Buitrago, Pittsburgh Super Computing Center
Sayeed Choudhury, Jonhs Hopkins University
Sean Davis, National Cancer Institute
Fei Fang, Carnegie Mellon University
Andreas Pfenning, Carnegie Mellon University

Organizing Committee:

Neelam Bharti, Carnegie Mellon University
Michelle Delvin, Pittsburgh Super Computing Center
Ann Marie Mesco, Carnegie Mellon University
Sarah Young, Carnegie Mellon University


All questions about submissions should be emailed to or

For more information and to register, visit the event website: