Understanding how humans interact with the physical world is a fundamental challenge in developing intelligent systems capable of assisting, learning from, and safely interacting with people. Consider a rehabilitation therapist remotely tracking patient recovery from everyday videos, novice athletes receiving feedback from expert demonstrations, or robots learning complex manipulation tasks by observing human actions. Realizing this vision requires artificial intelligence systems that understand not only visual appearance but also the underlying physics of human interactions, including how people apply forces, maintain balance, and manipulate objects in the three-dimensional space. However, current computer vision methods often produce visually plausible yet physically impossible reconstructions: objects float unsupported, bodies pass through solid surfaces, and interactions violate basic balance and force constraints. Such inaccuracies limit reliable deployment in health, safety, and robotics applications. This project develops a unified framework for learning physically grounded models of three-dimensional human-world interactions from real-world video, enabling more reliable analysis of movement, improved assistive technologies, and new tools for robotics and embodied intelligence. The project integrates research and education through student training, curriculum development, open software release, and outreach activities that connect computer vision with biomechanics, sports science and robotics. The research develops a scalable framework for physics-aware perception of human-world interactions. The work is organized in three integrated directions. First, it reconstructs physically consistent three-dimensional motion from monocular video by combining geometric reconstruction with dynamic simulation to estimate contacts, forces, and joint torques. Second, it leverages these simulation-consistent reconstructions to train generative models that capture the dynamics of human-object interaction beyond laboratory settings. Third, it incorporates physical reasoning into modern large-scale vision models, enabling systems that reason about effort, stability, and contact while preserving strong semantic understanding. By integrating established physical simulation tools with contemporary machine learning models and Internet-scale visual data, the project advances scalable approaches for understanding, predicting, and reasoning about human movement in natural environments. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544200 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Georgios Pavlakos | Institution: University of Texas at Austin, AUSTIN, TX | Award Amount: $349,323 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544200 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544200.html

CAREER: Physics-based Perception of 3D Human-World Interactions

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches