Enhancing Reuse of NHGRI Cohort Data with GA4GH Genomic Knowledge Standards
National Human Genome Research InstituteDescription
/Abstract Large genomic cohorts are a key tool for identifying genetic factors contributing to rare diseases, but the scattered and heterogeneous nature of genomic cohort datasets limits their interoperability for synthetic cohort construction. The NHGRI’s AnVIL platform provides access to diverse genomic resources, including data from the GREGoR and CSER consortia, which focus on rare and undiagnosed diseases. However, integrating and analyzing data across multiple cohorts remains a challenge due to differences in variant representation and data structure. This project aims to develop computational methods and workflows to overcome these challenges and support standardized multi-cohort analyses, enabling efficient reuse of AnVIL data for rare disease research. Aim 1 focuses on enhancing workflows for multi-cohort data integration by extending our existing tools for standardized cohort allele frequency (CAF) generation from GREGoR cohort data. We will implement a plugin-based architecture to accommodate diverse dataset structures, develop indexing strategies to efficiently integrate variant data, and enable phenotype-informed synthetic cohort creation. This work will provide tools that are interoperable across studies, supporting scalable and reproducible genomic analyses. Aim 2 will evaluate the utility of synthetic multi-dataset cohorts for studying diseases suspected to have a rare genetic cause. We will address potential issues associated with multi-dataset cohort construction, such as assessing such cohorts for the presence of duplicate patient samples. We will assess the combined CSER and GREGoR consortia data using one-sided matching strategies, and compute ACMG variant pathogenicity evidence codes for reuse of these AnVIL data in downstream variant interpretation workflows. This work will assess and refine best practices for rare disease research from synthetic multi-dataset cohorts. Aim 3 focuses on disseminating tools and methods to the broader research community. We will develop training materials, Jupyter notebooks, and protocol papers, provide hands-on demonstrations at AnVIL and GA4GH meetings, and contribute to genomic standards development. These efforts will promote widespread adoption of standardized workflows for rare disease genomics research. By improving interoperability and scalability of multi-cohort analyses on AnVIL, this project will enhance the discovery and interpretation of rare disease variants, advancing genomic medicine and precision health. Project Number: 1R03HG014803-01 | Fiscal Year: 2026 | NIH Institute/Center: National Human Genome Research Institute (NHGRI) | Principal Investigator: Alex Wagner (+1 co-PI) | Institution: RESEARCH INST NATIONWIDE CHILDREN'S HOSP, COLUMBUS, OH | Award Amount: $407,000 | Activity Code: R03 | Study Section: Special Emphasis Panel[ZRG1 BBBT-U (56)] View on NIH RePORTER: https://reporter.nih.gov/project-details/11284336
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$407,000 - $407,000
April 30, 2028
COLUMBUS, OH
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score