ERI: A Multimodal Embedding Framework for Enhanced In Silico Antimicrobial Peptide Discovery
National Science FoundationDescription
Antibiotic resistance threatens human health. It limits treatment options for infections. It compromises the ability to safely perform surgical procedures. Most antibiotics are small molecules that kill or limit the growth of bacteria. Peptides are short protein fragments, usually between 2 and 30 amino acids in length. Some result from digestion of larger proteins. Others are produced by the immune systems of many organisms, including humans. Antimicrobial peptides (AMPs) have been shown to have activity against bacteria, viruses, fungi, and parasites. They are a promising alternative to traditional antibiotics, but there are several challenges to their use. These include instability and, in some cases, toxicity to human cells. This project will employ advanced AI techniques to design AMPs with suitable properties. Training undergraduates in AI methodologies will contribute to the development of the future scientific workforce. The sequence-to-function problem in AMP discovery will be addressed by developing a multimodal representation-learning framework. It will combine sequence embedding from pre-trained Peptide Language Models (PLMs) with structural and physicochemical properties. Standard AMP classifiers treat peptides without experimental validation as negative examples. However, untested sequences may still exhibit antimicrobial activity, which would bias model training and compromise the classifier's overall accuracy. Therefore, the workflow will treat AMP prediction as a Positive-Unlabeled (PU) learning task. PU learning uses calibrated class-prior estimation to better identify candidate AMPs from large databases. This workflow could accelerate the development of effective antimicrobial candidates and address public health needs regarding drug-resistant pathogens. After embedding and classification, the geometry of peptide space will be analyzed to identify under-sampled functional regions. This should provide insight into structure-activity relationships. A visual analytics tool employing DenseMap will be employed to project high-dimensional embeddings into 2D or 3D spaces. Clusters will be annotated with metadata such as antimicrobial activity, solubility, hemolysis, and mechanism of action to highlight promising and unexplored peptide regions. The following major outcomes are anticipated: a unified multimodal embedding with a calibrated PU classifier for activity prediction; an open, reproducible framework based on a standardized benchmarking protocol for consistent comparisons across studies; and an interpretable visual tool that maps both explored and under sampled regions of peptide space, augmented with relevant biological metadata. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2553227 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Jesus Beltran Verdugo | Institution: California State L A University Auxiliary Services Inc., LOS ANGELES, CA | Award Amount: $199,868 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2553227 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2553227.html
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$199,868 - $199,868
April 30, 2028
LOS ANGELES, CA
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score