CAREER: Distribution-In-Distribution-Out Analytics: Modeling, Inference, and Applications in Data-Rich Science and Engineering
National Science FoundationDescription
This Faculty Early Career Development Program (CAREER) grant will contribute to advancing the nation’s leadership in data science by establishing new statistical methods for analyzing data that take the form of probability distributions rather than single numbers. In many modern scientific and engineering applications, including single cell biology, biotechnology, and advanced manufacturing, data are naturally represented by probability distributions, such as histograms or probability functions, describing variability within measurements. However, most current analytical methods reduce this rich distributional information to simple summary statistics, such as averages, thereby discarding valuable information and potentially limiting scientific insights. This project addresses the fundamental challenge of building rigorous statistical tools capable of directly modeling and drawing inferences from distribution-valued data. By preserving the full structure of the data, these methods will enable researchers and practitioners to uncover complex patterns and relationships that existing approaches may miss. The resulting tools have the potential to accelerate discoveries in areas such as perioperative medicine, biomarker identification for diagnosis and drug development, thereby strengthening the global competitiveness of the nation in data science and artificial intelligence. The educational components will provide training opportunities for undergraduate students and high school students in engineering statistics and biotechnology, while preparing graduate students to work in this emerging interdisciplinary field. This project will also engage the broader community through open-source software, accessible educational materials, and data challenge competitions. These efforts aim to broaden participation in science, technology and engineering, and support the development of a competitive workforce equipped with next-generation data analysis skills. This research will establish the statistical foundations for a distribution-in-distribution-out analytical framework that enables regression modeling and inference when both predictors and responses may be either scalar-valued or distribution-valued random variables. The project leverages differential geometry to define linear and nonlinear regression intrinsically in the metric space of probability distributions, moving beyond traditional approaches that operate exclusively in Euclidean space. Both the theory and statistical properties of these models will be developed, as well as new methods for analyzing model uncertainties. These results are expected to provide a foundational paradigm for artificial intelligence based on distributional data. The models developed will be validated through applications to single cell RNA sequencing, single cell proteomics, and perioperative medicine data, where preserving the full distributional information of protein and gene expression intensities is expected to yield more accurate and robust scientific discoveries. The project will also produce and maintain an open-source software module implementing the developed methods, making these analytical tools broadly accessible to researchers and practitioners across multiple disciplines. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2543434 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Xiaoyu Chen | Institution: SUNY at Buffalo, AMHERST, NY | Award Amount: $500,001 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2543434 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2543434.html
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$500,001 - $500,001
June 30, 2031
AMHERST, NY
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score