The immune system protects the body by recognizing abnormal cells and infectious threats, yet the molecular rules that allow immune cells to distinguish dangerous targets from healthy cells remain poorly understood. This project will develop new computational approaches to better understand immune recognition, a fundamental problem in biology with broad relevance to health, cancer immunotherapy, autoimmunity, and future biomedical discovery. The project will also advance computing by developing new methods to model biological systems using structure-aware protein language models. In addition to the research activities, the project will support a month-long summer program in New York City that introduces high school students to immunology, data science, and machine learning through hands-on projects and mentorship. Openly shared software, educational materials, and datasets will help broaden access to computational biology and strengthen the future scientific workforce. This project will develop structure-aware protein language models to decode how T cell receptors recognize peptide antigens presented by major histocompatibility complex molecules. The research will generate high-confidence three-dimensional models of paired T cell receptors, represent local structural environments as symbolic tokens, and integrate sequence, structure, and spatial information in a transformer architecture trained with masked multimodal prediction and contrastive learning. The resulting representations will be tested for their ability to identify shared antigen specificity across diverse immune repertoires and will then be integrated with single-cell transcriptomic profiles to link receptor recognition with cellular function. Experimental studies will be used to evaluate model predictions and iteratively refine the framework. The project is expected to produce new computational methods for modeling molecular recognition, new mechanistic insight into immune specificity, and open tools that can be reused across immunology and machine learning–driven biology. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2542232 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT,01002829DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Diego Chowell | Institution: Icahn School of Medicine at Mount Sinai, NEW YORK, NY | Award Amount: $204,412 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2542232 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2542232.html

CAREER: Structure-Aware Protein Language Models for Decoding Immune Recognition

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches