Many critical online decision systems, including clinical support, financial risk management, and autonomous technologies, must look beyond average performance to avoid rare but catastrophic "tail events." Traditional reinforcement learning often summarizes future outcomes as a single expected value, which masks significant risks and uncertainty. This research addresses this limitation by developing distributional reinforcement learning methods that learn the full range of possible outcomes to support safer, risk-aware, and privacy-preserving decision-making. By improving the trustworthiness of systems in health, finance, and operations, this work strengthens the intersection of machine learning, artificial intelligence, and statistics while promoting the responsible use of sensitive individual data. Additionally, the project supports education by training students at the intersection of statistics, machine learning, optimization, and responsible artificial intelligence. The research focuses on quantile temporal difference learning, a scalable model-free method for estimating return quantiles from observed transitions. First, the project will establish finite-time guarantees for quantile temporal difference learning in both synchronous settings and asynchronous settings with Markovian data, including bounds for quantile estimation error and for the accuracy of the estimated return distribution. Second, the project will develop statistical inference methods for distributional reinforcement learning, including online bootstrap procedures for confidence intervals for return quantiles and offline methods for return quantiles and conditional value at risk when either the number of trajectories or the number of time points is large. Third, the project will develop trustworthy distributional reinforcement learning methods for constrained decision making and privacy protection. The research will provide theory, algorithms, numerical studies, and software for inference-ready and privacy-aware distributional reinforcement learning, with anticipated applications in personalized healthcare, risk-sensitive financial decisions, and robust resource allocation. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2610563 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Lan Wang | Institution: University of Miami, CORAL GABLES, FL | Award Amount: $299,601 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2610563 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2610563.html

Distributional Reinforcement Learning for Risk-Sensitive Sequential Decision Making: New Theory and Methods

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches