Computational methods for phasing biobank sequence data
National Human Genome Research InstituteDescription
Next-generaƟon biobanks will contain sequence data for millions of individuals. This project will develop genotype phasing methods for these immense data sets. Genotype phasing esƟmates an individual's haplotypes, which are the two sequences of alleles that are inherited from the parents. Genotype phasing is necessary in order to perform powerful haplotype-based analyses. This project will develop computaƟonal methods that substanƟally reduce the cost of genotype phasing, so that it is possible to phase large biobanks for an acceptable cost. In addiƟon, the project will develop methods that allow marker filtering, sample filtering, and haplotype-based analyses to be performed directly on highly compressed data. This will remove the need to decompress phased genomic data prior to data filtering or haplotype-based analyses. This project will develop two methods that significantly improve genotype phase accuracy. The first method will idenƟfy difficult-to-phase heterozygous genotypes at run Ɵme and apply a special phasing algorithm to these heterozygotes. The second method will increase the number of geneƟc markers available for haplotype-based analyses by allowing less stringent quality control filters to be applied without harming genotype phase accuracy. Finally, the project will develop an open-source pipeline for phasing All of Us Research Program sequence data, and it will apply this pipeline to phase each sequence data release. The phased sequence data will be a shared resource that enables researchers to perform powerful haplotype-based analyses of All of Us genomic data, which will increase the power to detect genomic variants that influence heritable traits and diseases. Project Number: 1R01HG014458-01 | Fiscal Year: 2025 | NIH Institute/Center: National Human Genome Research Institute (NHGRI) | Principal Investigator: BRIAN BROWNING | Institution: UNIVERSITY OF WASHINGTON, SEATTLE, WA | Award Amount: $2,324,487 | Activity Code: R01 | Study Section: Biodata Management and Analysis Study Section[BDMA] View on NIH RePORTER: https://reporter.nih.gov/project-details/11200107
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$2,324,487 - $2,324,487
August 31, 2029
SEATTLE, WA
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score