This project will strengthen the foundations needed to make advanced language technologies more capable and reliable tools for science. Systems based on large language models can already help researchers search the literature, summarize prior work, answer technical questions, and suggest new hypotheses. Yet, these systems introduce problems in scientific settings: they may miss finding key publications, rely on weak or irrelevant evidence, present claims with unwarranted confidence, or offer persuasive but unfounded answers to questions that current scientific evidence does not yet support. These limitations can waste research time, mislead the scientific community, and reduce trust in methods that could otherwise help accelerate progress in scientific fields important to the Nation, including computer science, medicine, biology, and physics. This project addresses that need by developing the foundations required for more reliable language technologies for science: better ways to evaluate and understand capabilities of AI models for scientific tasks, better ways to adapt and advance their capabilities for this domain, and better ways to ensure that these models are reliable and their outputs are evidence-based and appropriately qualified. The project will generate openly available data, benchmarks, methods, software, and tutorials that can help researchers across institutions use these technologies more effectively and responsibly. It will also support graduate and undergraduate training, hands on research experiences, and activities that contribute to a strong future science and engineering workforce. The research will pursue three research thrusts. First, it will create new evaluation methods and frameworks for scientific tasks, including an evaluation framework for comparing systems on process-level and literature-grounded scientific reasoning, evidence retrieval, and long-form scientific analysis. This work will then provide evaluation testbeds for AI-based evaluators and measure and improve their alignment with domain experts. Second, it will develop new methods for adapting and advancing large language models in the scientific context. This includes compatible learning objectives to leverage the rich structure of scientific literature, building more powerful representations of scientific documents for retrieval and knowledge integration, and designing reasoning methods that can better integrate knowledge, as needed. Third, it will develop methods to improve reliability and control of language models in scientific domains, including techniques for aligning expressed confidence with actual uncertainty, recognizing when scientific questions are unanswerable or underspecified, and guiding systems toward more faithful, evidence grounded responses which is crucial in science. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2541654 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Arman Cohan | Institution: Yale University, NEW HAVEN, CT | Award Amount: $355,590 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2541654 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2541654.html

CAREER: Advancing LLMs for AI-Assisted Science: Evaluation, Adaptation, and Reliability

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches