openSANTA CRUZ, CA

Pangenome-Aware Methods for Accurate Somatic Variant Discovery in Cancer Genomics

National Cancer Institute

Description

(ABSTRACT) Accurate detection of somatic variants in cancer genomes remains significantly more challenging than germline variant detection, with typical error rates an order of magnitude higher. Multiple factors contribute to this disparity, including tumor heterogeneity, aneuploidy, widespread structural variation, and cross-sample contamination. However, additional key factors impeding progress include insufficient benchmark data for training and testing methods, limited adoption of long-read sequencing technologies, and reliance on linear reference genomes that introduce reference bias. We propose to address these challenges through three complementary aims. First, we will expand our existing Cancer Standards Long-read Evaluation (CASTLE) collection to twelve tumor-normal cell line pairs, sequencing each with multiple technologies including Illumina, Oxford Nanopore, and PacBio HiFi. We will generate complete telomere-to-telomere germline genome assemblies for each line and create comprehensive benchmark variant sets validated across technologies. All data will be openly released without access restrictions. Second, we will create new versions of our DeepSomatic variant caller that incorporate pangenome information by: (1) using pangenome-based read mapping to reduce reference bias, (2) incorporating complete haplotype information from the Human Pangenome Reference Consortium into variant inference, and (3) utilizing personalized pangenome references imputed from sequencing data. Third, we will extend our Severus structural variant caller to work with both complete germline assemblies and pangenome references, exploring multiple approaches including direct mapping to diploid assemblies, mapping to merged diploid pangenome graphs, and using personalized pangenome references with imputed haplotypes. The successful completion of these aims will provide essential benchmark data enabling further method development, improved methods for detecting both small variants and structural variants in cancer genomes, and standardized variant call sets for major cancer genomics projects. Our team brings together leading expertise in pangenomics, machine learning, and cancer genomics, positioning us to successfully execute this ambitious program. Project Number: 1U01CA309342-01 | Fiscal Year: 2026 | NIH Institute/Center: National Cancer Institute (NCI) | Principal Investigator: Benedict Paten | Institution: UNIVERSITY OF CALIFORNIA SANTA CRUZ, SANTA CRUZ, CA | Award Amount: $593,867 | Activity Code: U01 | Study Section: Special Emphasis Panel[ZRG1 MGG-W (50)] View on NIH RePORTER: https://reporter.nih.gov/project-details/11294518

Interested in this grant?

Start a free 7-day trial to get match scores, save grants, and build your application with AI.

Start free trial

Grant Details

Funding Range

$593,867 - $593,867

Deadline

April 30, 2029

Geographic Scope

SANTA CRUZ, CA

Status
open

View the application link

Start a free 7-day trial to open the original listing and funder website, save this grant, and track its deadline. Cancel anytime.

Start free trial

Want to see how well this grant matches your organization?

Get Your Match Score

Get personalized grant matches

Start your free trial to save opportunities, get AI-powered match scores, and manage your applications in one place.

Start Free Trial