CAREER: Approximation-First Telemetry for Hyperscale Networked Systems
Description
As cloud computing, artificial intelligence infrastructure, and internet services continue to grow, it becomes increasingly important to monitor the large networked systems that support communication, commerce, education, health, and science. These systems include massive collections of servers, network devices, storage services, and software components that must work together reliably and efficiently. However, the data generated about network traffic, resource usage, and failures can be too large to analyze in full, especially when operators need answers in real time. This project develops an approximation-first approach to telemetry for hyperscale networked systems, using compact, informative data summaries to answer important monitoring questions quickly while greatly reducing cost and overhead. The project establishes an end-to-end approximation-first telemetry architecture for hyperscale networks through four research thrusts. The first develops mergeable summaries that can be created on end hosts and networked devices while tracking uncertainty. The second develops low-latency aggregation and query methods that answer telemetry questions directly from these summaries. The third develops learning-guided compression for long-term telemetry storage using both lossy and lossless approaches. The fourth creates a management engine that maps user goals for accuracy, responsiveness, and cost into efficient telemetry configurations. Together, these thrusts advance telemetry systems, networked systems, and large-scale distributed computing. The project's cost-effective telemetry can help operators detect network anomalies, bottlenecks, failures, and attacks more quickly while lowering the compute, storage, and energy required for monitoring. The project will also create educational materials and hands-on learning opportunities in networking, cloud computing, and systems, and will release open-source software to support researchers, students, and practitioners building new analytics tools. Project software, documentation, and research artifacts will be released through a project repository (frootlab.cs.umd.edu/projectasap) in University of Maryland hosted web resources. These materials may include software, publications, experimental artifacts, and selected datasets or benchmarks. Public resources will be maintained for at least five years after the project ends or after data release to support reproducibility, reuse, and follow-on research. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2544434 | Program: 01002829DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Zaoxing Liu | Institution: University of Maryland, College Park, COLLEGE PARK, MD | Award Amount: $358,200 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2544434 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2544434.html
Interested in this grant?
Sign up to get match scores, save grants, and start your application with AI-powered tools.
Grant Details
$358,200 - $358,200
May 31, 2031
COLLEGE PARK, MD
External Links
View Original ListingWant to see how well this grant matches your organization?
Get Your Match Score