Many scientific breakthroughs happen when a researcher notices that a technique from one field could solve a problem in another — a discovery in materials science, for example, might share a deep structural similarity with a finding in a different field. Yet with hundreds of new papers appearing each month even within narrow specialties, no individual can read broadly enough to spot these cross-cutting connections reliably. This project develops artificial intelligence systems that help researchers uncover such hidden connections across the scientific literature — systems that can recognize structural parallels between ideas in different domains. Critically, creative reasoning by AI carries risks: a connection that looks insightful may turn out to be unfounded. To address this, the systems developed in this project are designed to be transparent, showing researchers the evidence behind every suggestion so they can verify claims and distinguish well-supported insights from speculation. This project serves the national interest by promoting the progress of science. By helping researchers identify promising new directions more quickly this work has the potential to accelerate discoveries that directly improve quality of life. All tools and datasets will be released publicly, and in partnership with major research conferences, these tools will be integrated into the peer-review process to support real-world scientific evaluation. The project also includes substantial educational efforts: paid summer research internships for Baltimore high school students, a new AI literacy course for pre-college students, integration of research tools into undergraduate teaching, and inclusive workshops and mentoring to broaden participation in computing and artificial intelligence. This project develops interpretable, large language model (LLM)-based frameworks that support creative scientific discovery while ensuring transparency and grounding in evidence. The research is organized around four interconnected thrusts. The first thrust develops a general-purpose formalism for analogical retrieval over large scientific corpora. This uses an iterative approach to create a retrieval-and-reasoning loop, akin to exploration-exploitation, that enables applications including cross-domain hypothesis generation and tabular summarization of related literature. The second thrust advances temporal reasoning in scientific domains. It begins by systematically evaluating how reliably LLMs encode scientific knowledge over time, analyzing how factors such as publication age, citation impact, and controversy influence what models have memorized. Building on these findings, the thrust develops methods to mine the scientific genealogy of research papers, tracing chains of intellectual influence through recursive analogical retrieval to produce interpretable graphs of how ideas emerge, evolve, and connect. The third thrust creates a principled framework for grounded, verifiable generation. The core innovation is a "myopic" verifier, a secondary model that simulates a time-constrained human fact-checker, assessing each claim using only a narrow excerpt from the cited source. Models are trained via reinforcement learning to produce only claims that are immediately and explicitly verifiable. This approach extends to incorporating verbatim quotations from cited evidence, further reducing the cognitive burden of verification. The fourth thrust builds publicly available scientific tools and evaluation benchmarks, including retrospective analogy benchmarks, grounding evaluation protocols, and a peer-review co-pilot to be deployed at major research conferences. The resulting methods are developed in collaboration with domain experts at the National Institutes of Health, but are broadly applicable to any domain requiring structured, cross-domain reasoning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Found NSF Award ID: 2542238 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Daniel Khashabi | Institution: Johns Hopkins University, BALTIMORE, MD | Award Amount: $357,069 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2542238 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2542238.html

CAREER: Reasoning-Centered AI for Scientific Discovery

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches