This project develops mathematical tools for assessing when artificial intelligence systems can be trusted after their predictions are used to make decisions. Modern AI tools help rank drug candidates, support medical decisions, screen large data sets, and suggest scientific hypotheses. In these settings, predictions are often used selectively and repeatedly: users may follow up only on top-ranked cases, choose confidence levels after seeing outputs, or let automated tools gather evidence over time. Standard uncertainty statements can become overly optimistic in such adaptive pipelines. This project will build a statistical quality-control layer for AI-assisted decisions and discoveries, helping users understand how reliable the resulting decisions are. The work can improve reproducibility and efficiency in biomedical science, drug discovery, and other data-intensive fields where experiments are costly and errors can slow progress. The project will also train graduate and undergraduate students in modern statistics, trustworthy AI, and responsible data science, with efforts to broaden participation. Publicly available software, benchmarks, and teaching materials will support education, reproducible research, and safer use of AI. The technical goal of this project is to develop finite-sample, model-agnostic, and distribution-free inference methods for AI systems used inside adaptive decision and discovery pipelines. The work draws on conformal prediction, predictive inference, selective inference, multiple testing, permutation methods, and anytime-valid testing. The first thrust will develop set-level predictive inference methods for multiple unlabeled instances, including false discovery rate control, family-wise error rate control, global null testing, partial-conjunction testing, and model selection under exchangeability and weighted exchangeability. The second thrust will study predictive inference under adaptive human use, including selective issuance of prediction sets and data-dependent choices of confidence levels, and will develop selection-conditional guarantees, auditing tools, and calibration methods. The third thrust will develop selective inference methods for automated evidence gathering, with a main focus on generated hypotheses, dataset reuse, and optional stopping through e-values and e-processes. The project will also explore extensions to dynamically expanding hypothesis structures and to settings in which automated evidence may be imperfect. Together, these results will expand the mathematical foundations of uncertainty quantification for black-box AI systems and provide open-source tools for reliable scientific discovery. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2610282 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Ying Jin | Institution: University of Pennsylvania, PHILADELPHIA, PA | Award Amount: $200,000 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2610282 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2610282.html

Distribution-Free Inference for AI-in-Use: Addressing Multiplicity, Selectivity, and Adaptivity

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches