Modern scientific and engineering challenges, from understanding cell growth to predicting material failure and crack formation under stress, require complex modeling and expensive experiments. While machine learning has demonstrated remarkable potential to accelerate scientific discovery for highly complex systems and reduce costs, its adoption in scientific research remains limited by a crucial bottleneck: the shortage of labeled training data. Obtaining large quantities of labeled data for scientific problems is often itself prohibitively costly, time-consuming, and sometimes physically impossible. This data scarcity creates two additional challenges: trained models often fail when applied to new conditions or experimental contexts, and the reasoning behind their predictions remains opaque, limiting confidence in the results as well as the ability to leverage those results to develop new scientific knowledge. Solving this small data problem by taking advantage of information about how the systems change with time will unlock the potential of machine learning to achieve higher performance with limited labeled datasets. This will ultimately accelerate innovation across chemistry, materials science, biology, and engineering, advancing technologies from battery development to manufacturing innovation by reducing costs, enhancing safety, and improving performance through AI-assisted automation and discovery. This project develops a unified framework for enabling machine learning with minimal labeled data in scientific applications by considering a dynamical systems approach. The research involves three complementary algorithmic advancements: first, developing new methods to learn families of evolution equations from only a handful of dynamic trajectories; second, developing computer vision algorithms that leverage known or discovered evolution equations to learn from unlabeled or sparsely labeled experimental time series; and third, improving active learning strategies for extremely small labeling budgets that will be leveraged to enhance the first two advancements. Interpretable symbolic methods will be used throughout to model evolution, enabling extrapolation to new conditions. These methods will be validated on real scientific applications spanning multiple domains in materials science and chemical engineering to demonstrate their broad utility and generalizability across fields. The project will also support the development of educational tools and methodologies for training a workforce equipped to address the complexities of the highly interdisciplinary field of scientific machine learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544082 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Qian Yang | Institution: University of Connecticut, STORRS, CT | Award Amount: $394,370 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544082 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544082.html

CAREER: Data-Driven Learning of Interpretable and Extrapolative Models of Complex Systems with Sparse Data

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches