The range of users of artificial intelligence (AI) systems has vastly broadened in recent years, and decisions made with the help of AI are now affecting almost everyone. Naturally, the individuals have different preferences over how the AI should behave, which poses new challenges because traditional approaches for developing and deploying AI assume that users have the same preferences. This project builds a theoretical understanding of the blind spots of current AI development and deployment pipelines by studying, for example, when disagreements can make the AI system adopt an unbalanced position or behave erratically. In their place, the project designs new algorithms for the alignment, fine-tuning, and deployment of AI that are provably robust to disagreeing user preferences and strike a sensible compromise between them. To bring the benefits of this research to the public, the project will also produce two software tools: a website comparing AI models to help users choose the right AI tool for their needs and an AI-enhanced tool providing live information on group opinions in participatory processes. The project also develops new courses and educational materials training students in the mathematical foundations of preference-aware AI. The project will achieve these goals by integrating key concepts from computational social choice theory (which provides methods for aggregating the individual group members into a collective decision) and generative AI. First, the project analyzes alignment methods through the lens of distortion, which measures the fraction of choice lost when an algorithm observes only pairwise, stochastic comparisons rather than users' true preferences. This analysis quantifies the shortcomings of the dominant alignment method, reinforcement learning from human feedback, and identifies alternatives with better guarantees. Second, the project designs alignment methods satisfying proportionality guarantees adapted from social choice theory such as core stability, ensuring that every subset of users with aligned preferences receives influence on the AI system commensurate with the group's size. Third, the project develops algorithms for AI-enhanced collective decision processes that select outcomes from open-ended alternative spaces, such as all possible textual summaries, while satisfying strong representation axioms, with a focus on maintaining guarantees even when the AI components fail. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2543553 | Program: 01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Paul Goelz | Institution: Cornell University, ITHACA, NY | Award Amount: $367,129 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2543553 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2543553.html

CAREER: Responsiveness to Heterogeneous Preferences in AI Alignment and Deployment

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches