openCHARLOTTESVILLE, VA

CAREER: Advancing Differentially Private Data Synthesis: A Holistic Approach

National Science Foundation

Description

This project studies how to create synthetic datasets that retain useful patterns from sensitive data while protecting privacy of individuals. Many hospitals, companies, public agencies, and researchers need data to improve services, test ideas, etc., but they often cannot share original records because they contain private information. This project addresses this gap by making data sharing safer and more useful. The project's novelties are creating a general way to break synthetic data generation into two connected steps, new methods that combine classical statistical ideas with modern learning tools, and systematic ways to use public data and existing models without weakening privacy protection. The project's broader significance and importance are that it expands safe access to data for research and education, strengthens privacy practice in data-driven fields, and creates training and research opportunities for students. Specifically, the research develops a framework that separates synthetic data generation into information extraction from sensitive data under formal privacy protection based on differential privacy and reconstruction of synthetic data from the extracted information. Within this framework, the project has three research thrusts. First, for tabular data, it examines why statistical methods often outperform neural network methods and designs hybrid methods that combine strengths from both approaches. Second, for image and multimodal data, it studies adaptive high-order projections, including Fourier representations, to capture broad structure and preserve relationships across data types. Third, it develops a double-cone framework for selecting, expanding, and adapting public data sources and for using existing models so that public information can improve synthetic data quality in a systematic way. The project also brings these ideas into courses, student research, open-source tools, and public demonstrations. The expected results are stronger foundations and more practical methods for privacy-protected synthetic data generation across application areas. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2543284 | Program: 01002930DB NSF RESEARCH & RELATED ACTIVIT,01003031DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Tianhao Wang | Institution: University of Virginia Main Campus, CHARLOTTESVILLE, VA | Award Amount: $395,277 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2543284 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2543284.html

Interested in this grant?

Sign up to get match scores, save grants, and start your application with AI-powered tools.

Start Free Trial

Grant Details

Funding Range

$395,277 - $395,277

Deadline

September 30, 2031

Geographic Scope

CHARLOTTESVILLE, VA

Status
open

External Links

View Original Listing

Want to see how well this grant matches your organization?

Get Your Match Score

Get personalized grant matches

Start your free trial to save opportunities, get AI-powered match scores, and manage your applications in one place.

Start Free Trial