As artificial intelligence (AI) systems are increasingly deployed in critical domains such as healthcare, scientific discovery, and autonomous decision-making, ensuring that foundation AI models such as large language models (LLMs) align with human values and preferences has become essential for their safe and beneficial deployment. However, most existing approaches rely on large amounts of high-quality labeled preference data and assume clean, stable, well-controlled environments. These assumptions are often violated in real-world scenarios, where data may be limited, noisy, or subject to change over time. This project addresses the fundamental problem of aligning LLMs with human preferences under such real or open-world settings. The outcomes are expected to improve the data efficiency and reliability of LLMs in high-impact applications, including medical diagnosis and molecular discovery, while also contributing to education and workforce development through the integration of research and training activities. The project focuses on three key aspects of aligning LLMs with human preferences in open-world settings. First, it develops novel data-efficient preference alignment algorithms that enable LLMs to maintain effective alignment in open-world environments with limited human-annotated preference data. Specifically, when LLMs encounter new tasks or domains, the proposed algorithms can strategically minimize reliance on extensive human or AI annotation while maximizing alignment performance across different low-data scenarios. Secondly, this project aims to enhance the reliability of LLM-based AI systems to maintain safe and robust alignment with human preferences when confronted with unreliable inputs. We will develop algorithmic solutions that can mitigate various forms of data-quality issues (e.g., distribution shifts, label noise, and human value shifts) while preserving alignment performance across different open-world environments. Lastly, this project will demonstrate successful deployment in high-stakes domains, including biochemistry and public health, and will reveal fundamental principles about domain-specific preference alignment while maintaining efficiency and reliability guarantees. This cross-interdisciplinary validation will open new research avenues for LLM alignment with human preferences in interdisciplinary research. Overall, the transition from closed-world to open-world preference alignment represents a fundamental paradigm shift that will provide critical insights about real-world deployment challenges, benefiting the entire AI community. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544599 | Program: 01003031DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Kaize Ding | Institution: Northwestern University at Chicago, EVANSTON, IL | Award Amount: $419,999 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544599 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544599.html

CAREER: OPENALIGN: Towards Open-World Preference Alignment for Large Language Models

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches