Modern artificial intelligence (AI) systems are increasingly limited not by arithmetic, but by memory. As frontier AI models become more capable, they require far more data to be moved, stored, and accessed efficiently. These workloads systematically generate large volumes of short-lived data that are written in memory, consumed, and quickly discarded, as well as long-lived data that must be retained reliably across much longer time scales. Conventional memory systems are poorly optimized to this behavior, as they are typically designed as one-size-fits-all storage, resulting in excessive energy consumption and increasingly limited density scaling. This project addresses that mismatch by developing a computing infrastructure that treats data persistence as a central design consideration by statically and dynamically matching short-lived and long-lived data to differentiated memory architectures and technologies, each optimized for the appropriate retention window. The result is a more efficient and sustainable foundation for accelerating large-scale AI systems in both datacenter and edge settings. The project also supports education and workforce development through new course materials, industry engagement, and open-source resources that expand participation in next-generation AI hardware accelerator design and computer engineering. The project develops a retention-aware computing stack that aligns AI application data lifetimes with heterogeneous memory architectures and technologies offering different retention times and densities. The research is organized around four integrated activities: (1) building a profiling framework to characterize how long different data values remain useful and to map those lifetimes onto suitable memory tiers; (2) developing algorithmic techniques that restructure computation and data movement to better satisfy retention constraints; (3) designing compilation and scheduling methods for retention-aware data placement across differentiated memories; and (4) validating the resulting hardware-software co-design through the agile design and tapeout of memory-centric AI hardware accelerator chips. Together, these efforts will establish a cross-layer framework for rethinking memory-system design around data persistence and creating a foundation for more energy-efficient, high-performance hardware systems for the AI era. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2541050 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Thierry Tambe | Institution: Stanford University, STANFORD, CA | Award Amount: $424,237 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2541050 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2541050.html

CAREER: Enabling Efficient AI Computing at Scale with Heterogeneous Retention-Aware Memory Systems

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches