Generative artificial intelligence can create realistic text, images, and other complex data, offering new ways to support science, education, industry, and public decision-making. However, these systems can also make errors that are difficult for users to detect or measure. As a result, organizations may not know when generated data can be trusted, whether a system works equally well across different populations, or how its performance changes over time. This project addresses these challenges by developing statistical tools that measure the reliability and uncertainty of generative artificial intelligence systems. These tools will help researchers, educators, businesses, and public institutions use synthetic data more safely and effectively. One important application is the creation of digital twins of student populations, which can provide safe testing environments for new educational technologies without exposing real students to untested systems. The project also broadens participation in science and technology through hands-on activities for K-12 students, new graduate courses, doctoral student mentoring, and community workshops. This project develops a statistical framework for evaluating and designing uncertainty-aware generative models. The investigator will pursue three connected research thrusts. First, the project will develop methods to measure the overall fidelity of black-box generative models using the concept of effective sample size, which provides an interpretable way to assess how much reliable information the synthetic data contain. Second, the project will extend these methods to provide local and context-aware uncertainty estimates for specific inputs, subpopulations, and changing environments. These estimates will help identify when a generative model is reliable and when its outputs require caution. Third, the project will develop statistically guided training strategies that improve generative models for counterfactual analysis and choice modeling. These strategies include artificial intelligence assisted articulation for modeling human needs and knowledge distillation methods for learning counterfactual relationships. By embedding statistical goals directly into generative modeling, the project will advance the use of synthetic data in settings where real data are scarce, costly, sensitive, or inaccessible. The expected outcomes include new theory, practical methodology, open source software, and educational resources that support the trustworthy deployment of generative models. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544147 | Program: 01003031DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Kaizheng Wang | Institution: Columbia University, NEW YORK, NY | Award Amount: $243,000 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544147 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544147.html

CAREER: Statistically Grounded Generative AI: Uncertainty Quantification and Principled Design

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches