Many modern Artificial Intelligence (AI) models can produce meaningful text, but they often fail on complex structural and numerical data involving different units, formulas, information describing the data (i.e., metadata), and hierarchies. These failures are especially concerning in areas such as medicine, finance, defense, and space, where even small quantitative mistakes can lead to misleading conclusions and significant negative consequences. This project aims to address this problem by developing a new AI model focused on accurate comprehension of complex numerical and structured data rather than natural language text. The project will help make scientific knowledge more transparent and accessible, while also supporting education through new teaching materials, student research opportunities, and outreach activities that engage learners in data reasoning. By improving the ability of AI to work correctly with complex numerical and structured data, the project advances the progress of science, supports health and welfare, and strengthens the nation’s capacity for trustworthy data-driven discovery and decision-making. The project develops the Large Number Model (LNM), a hybrid neural-symbolic model for reliable reasoning over numbers, units, formulas, and complex tabular data. The research includes three main activities: creating scalable methods to extract numerical and structured information from documents, designing model architectures that represent quantities and two-dimensional tabular structures more effectively than text-only systems, and incorporating symbolic validation to check algebraic, dimensional, and semantic consistency. The project will also develop methods for combining quantitative evidence across multiple sources and will evaluate the resulting system through controlled experiments, robustness tests, and benchmark datasets drawn from scientific and medical domains. The expected contribution is a new foundation for AI systems that are more accurate, interpretable, dependable, and compatible with the full data cycle when working with complex numerical and structured knowledge. This, in turn, is expected to maximize the utility of information resources. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544222 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Michael Gubanov | Institution: Florida State University, TALLAHASSEE, FL | Award Amount: $301,560 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544222 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544222.html

CAREER: Numerically Literate AI via the Large Number Model and Foundational Data Curation Methods

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches