Artificial intelligence (AI) systems increasingly influence both high stakes and everyday decisions across many sectors of the economy. These systems, however, are not developed in isolation. Instead, they depend on people to provide instructions that describe what the system should do and what outcomes it should avoid. These instructions can take many forms. They may be written explicitly by domain experts or learned from data such as human preferences over outcomes. However, providing clear and reliable instructions for intelligent systems is difficult even for relatively narrow applications. Instructions can be too rigid, too vague, or simply incorrect, and any of these problems can cause systems to behave in unintended ways. These failures occur because instructions are created by people, and human reasoning is shaped by limited information, context, and common cognitive mistakes. As AI becomes more widespread, improving how systems interpret human intent will be essential for safety and reliability. This project addresses that challenge by studying how people communicate goals to machines and by designing AI systems that can interpret imperfect instructions by reasoning about the intent behind them. The expected outcomes include safer decision-making technologies and new tools that help organizations deploy AI more effectively. This project develops computational foundations for learning AI specifications from imperfect human input. The research integrates reinforcement learning, Bayesian inference, and computational cognitive modeling with empirical studies of human decision making to better characterize how people communicate goals and where specification errors arise. The work is organized around three research thrusts. The first thrust, Modeling and Inferring AI Specifications, develops probabilistic models of human reasoning that capture systematic specification errors and uses these models to enable AI systems to infer more accurate goals from flawed instructions. The second thrust, Richer Inputs and Representations, expands how AI systems learn from people by incorporating different forms of input such as preferences, demonstrations, explanations, gestures, and structured debate. New algorithms and elicitation interfaces will integrate these signals and resolve inconsistencies across modalities. The third thrust, Personalization and Governance, develops methods for learning multiple reward models that reflect differences in human preferences, enabling scalable personalization and avoiding one-size-fits-all objectives. In parallel, the project will develop educational programs that prepare students to design and govern AI systems. These activities include revising an undergraduate AI course to emphasize human decision-making in the design of AI systems, creating a graduate course on AI policy and governance, and expanding the AI Policy Summer School to help build a national workforce that is fluent in both AI technology and public policy. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2541285 | Program: 01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Serena Booth | Institution: Brown University, PROVIDENCE, RI | Award Amount: $442,685 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2541285 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2541285.html

CAREER: Inferring Specifications for AI by Modeling Humans

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches