While natural intelligence often learns sophisticated skills through simple visual observation, current artificial intelligence (AI) systems largely lack this ability. Most modern AI systems require massive amounts of text or human-labeled data to understand the world, a dependency that limits the ability of machines to perform complex physical tasks that are difficult to describe in words. To address these limitations, this project creates a scientific framework that allows machines to learn directly from passive observations and active interactions with the physical environment. The project moves toward a new paradigm where video serves as the primary medium for machine intelligence, enabling autonomous systems to plan and act by observing videos of human and robotic behavior. By fostering the development of more capable and helpful autonomous agents, the project serves the national interest in AI leadership and technical workforce development. These efforts include specialized course design, cross-disciplinary collaborations, and mentorship programs that support students from high school through the doctoral level. This research establishes a new paradigm for machine intelligence centered on the concept of adaptable video blueprints. These blueprints function as a representation that allows an agent to translate a visual experience into a sequence of physical actions generalizable across diverse tasks, environments, and embodiments. Three integrated thrusts drive the technical approach. The first thrust develops visual planners that utilize video generation models to causally predict the future states necessary to reach a target goal. The second thrust focuses on inverse dynamics to map these predicted visual sequences into specific motor commands. The third thrust implements an automatic self-improvement loop, allowing the agent to refine its planning and execution through continuous experience and adaptation. This award advances the fields of computer vision and AI by grounding visual generation in physical interaction and providing a scalable method for machines to acquire sophisticated skills with minimal human intervention. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2543166 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Chen Sun | Institution: Brown University, PROVIDENCE, RI | Award Amount: $346,851 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2543166 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2543166.html

CAREER: Learning Adaptable Video Blueprints for Intelligent Systems

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches