Sequence-to-function models for mechanistic investigations of personal genomes
National Human Genome Research InstituteDescription
Recent advances in functional genomics and computational modeling and hardware have ushered in a new era of sequence-to-function (S2F) deep learning models, allowing the direct prediction of functional outcomes from DNA sequences. We are poised to build comprehensive models of gene regulation and precisely predict gene expression consequences of arbitrary genotypes. However, two significant challenges limit the potential of current models. Firstly, current models are not able to correctly predict the subtle gene expression consequences that arise from natural genetic variation. Secondly, current models can only make accurate predictions for the cellular contexts with abundant functional genomic training data and are not applicable to contexts important for many diseases such as early developmental time points and difficult to isolate cell types. Our proposal aims to address these challenges and advance the field. Aim 1 develops S2F models that are performant on the full spectrum of natural genetic variations. Current S2F models generalize across genomic regions, but struggle to generalize across diverse genotypes represented in personal genomes, due to the subtleties in gene expression changes. We propose a novel and general learning framework that integrates allele-resolved functional genomic data, enhancing the model's ability to predict variant effects. This framework employs custom loss functions and learning algorithms for efficient utilization of personal genomes and allele-resolved datasets. Aim 2 introduces modularized model architectures that improve generalization and adaptability, particularly in scenarios with limited data. Gene expression regulation involves complex processes, each with a distinct relationship to DNA sequence. We propose factorized models that utilize biological prior knowledge to constrain the model and decouple its parameters to correspond to distinct biological processes. Regularizing the model with known biological principles enhances generalization, yields interpretable predictions, and enables context-specific fine-tuning in data-limited settings. Aim 3 applies these enhanced models to study regulatory mechanisms underlying neurodevelopmental disease. Collaborating with domain experts, we will investigate how sequence variations affect gene expression in autism spectrum disorder (ASD) and schizophrenia (SCZ). ASD and SCZ involve abnormalities in brain development that may occur even before birth. Hence, functional data from the relevant cell-types and developmental time points is limited. Here, we will apply our models to Whole-Genome Sequencing data collected from SCZ and ASD patients to predict gene expression values across genome and brain cell types across early developmental time points. By associating the resulting imputed gene expression values with disease outcome, our approach will drastically reduce the multiple testing burden that is currently hampering rare non-coding variant investigation in complex disease. In summary, our methods will offer powerful computational tools to study and interpret personal genomes, enabling mechanistic investigation of the full spectrum of genetic variation in complex disease. Project Number: 1R01HG013724-01A1 | Fiscal Year: 2025 | NIH Institute/Center: National Human Genome Research Institute (NHGRI) | Principal Investigator: Sara Mostafavi (+2 co-PIs) | Institution: UNIVERSITY OF WASHINGTON, SEATTLE, WA | Award Amount: $2,217,249 | Activity Code: R01 | Study Section: Genetic Variation and Evolution Study Section[GVE] View on NIH RePORTER: https://reporter.nih.gov/project-details/11129585
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$2,217,249 - $2,217,249
August 31, 2029
SEATTLE, WA
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score