Description
Scientific findings should come with error rates that mean what they say: among findings assigned a 5 percent chance of error, about 5 in 100 should turn out to be wrong. This standard, called calibration, underlies trusted probability claims from weather forecasting to machine learning, but it is not yet a routine part of the statistical tools used in many large-scale scientific studies. The issue arises whenever researchers must triage long lists of possible discoveries, anomalies, or published claims. In metascience, the question is which findings in the literature will replicate; in AI safety, which suspicious model inputs deserve greater scrutiny. Current methods control the average error rate across an entire list of discoveries, but they rarely provide individual findings with calibrated error probabilities. This award supports research on calibrated hypothesis testing, which will develop methods that distinguish strong evidence from borderline evidence with interpretable, rigorous guarantees. The work will support more reproducible science and safer data-driven systems, while training graduate researchers, developing new instructional materials, and releasing open-source software. This project will develop theory and methodology for calibrated, large-scale inference. The framework draws upon probabilistic forecasting but addresses a distinct challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed, so calibration must be assessed stochastically and established indirectly. The investigators will combine empirical Bayes estimation with frequentist finite-sample guarantees, extending local and boundary false discovery rates beyond settings with independent p-values. Variable selection will serve as the first setting, using knockoff and sign-symmetric statistics to construct local error assessments for selected variables. Conformal outlier detection will extend these ideas to discrete and dependent p-values produced by a shared calibration dataset. Online testing will build on both directions by treating sequential threshold choice as an online learning problem under distribution drift. Together, these three settings will demonstrate that calibrated local error rates constitute a fully functional statistical concept with broad applicability. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2610643 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Jake Soloff | Institution: Regents of the University of Michigan - Ann Arbor, ANN ARBOR, MI | Award Amount: $146,715 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2610643 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2610643.html
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$146,715 - $146,715
May 31, 2029
ANN ARBOR, MI
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score