Description

Identifying and studying the non-coding genomic sequences responsible for gene regulation—e.g., enhancers— is an essential adjunct to sequencing the genomes themselves. Enhancers serve as critical nodes in gene regulatory networks and as such play a major role in determining how genotype influences phenotype; enhancers strongly impact disease, homeostasis, evolution, and in particular, development. We have developed a computational machine-learning approach, SCRMshaw, for discovering unidentified enhancers. In our own work we have primarily applied SCRMshaw to the discovery of enhancers and investigation of gene regulatory networks in the fly Drosophila melanogaster and a diverse (spread over >360 million years) set of insect species; studies of enhancers in Drosophila historically have had an outsized impact on our understanding of these important regulatory elements and of gene regulation in general. Importantly, however, we have also shown that when trained on phylogenetically appropriate data, SCRMshaw is equally effective at predicting non-insect— including mammalian—enhancers. While the SCRMshaw approach is effective, the software pipeline itself has many deficiencies, limiting its adoption by others and complicating efforts to keep it maintained and stable. Code is outdated and a mash-up of different programming languages, and the software is poorly engineered, leading to burdensome storage and memory requirements. Data from SCRMshaw are maintained as part of the REDfly database of transcriptional regulatory elements, but the SCRMshaw data are not easily searchable and not properly integrated into this powerful resource. This proposal significantly enhances the sustainability and impact of the SCRMshaw software. Modernization and refactoring of the SCRMshaw code will make the software more stable, sustainable, and easily adoptable by others. Outdated Perl code will be converted to Python, and C++ code to Rust. All of the code will be refactored to remove unnecessary portions and to enable more sophisticated parallelization, efficiency, and storage management. Docker and Apptainer (Singularity) containers and Snakemake workflows will be developed to facilitate adoptability and reproducibility. Improvements to the SCRMshaw results database and integration with the fully-featured REDfly knowledgebase will make SCRMshaw data significantly more FAIR (findable, accessible, interoperable, and reusable) and strengthen both resources to provide a powerful platform for comparative regulatory genomics, one of SCRMshaw’s major advantages for understanding enhancer function and gene regulatory networks. The proposed software development will enable us to increase the user community for SCRMshaw and help make results using the software more reproducible. This will in turn enable extensive new enhancer discovery, leading to important new insights into mechanisms of gene regulation, gene regulatory networks, and development. Project Number: 1R03HD119839-01 | Fiscal Year: 2026 | NIH Institute/Center: Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) | Principal Investigator: MARC HALFON | Institution: STATE UNIVERSITY OF NEW YORK AT BUFFALO, AMHERST, NY | Award Amount: $240,150 | Activity Code: R03 | Study Section: Special Emphasis Panel[ZRG1 MCST-E (51)] View on NIH RePORTER: https://reporter.nih.gov/project-details/1R03HD11983901

Interested in this grant?

Sign up to get match scores, save grants, and start your application with AI-powered tools.

Start Free Trial

Grant Details

Funding Range

$240,150 - $240,150

Deadline

January 31, 2028

Geographic Scope

AMHERST, NY

Status
open

External Links

View Original Listing

Want to see how well this grant matches your organization?

Get Your Match Score

Get personalized grant matches

Start your free trial to save opportunities, get AI-powered match scores, and manage your applications in one place.

Start Free Trial