Generative artificial intelligence (AI) now produces text at scale, which creates urgent needs for trustworthy ways to identify and trace AI-generated content. This project advances watermarking, a family of methods that embed a hidden signal into generated text so it can be identified later, while keeping the text useful and natural. A major goal is to enable more reliable attribution than current approaches, including the ability to encode more than a single yes-or-no identifier so that content can be traced to a specific model, system, or authorized use. The project also strengthens resilience against attempts to erase the watermark or to fabricate text that falsely appears watermarked. Outcomes support responsible use of generative AI in research, education, and society by improving tools for protecting intellectual property in datasets, increasing trust in automated reviews and other AI-assisted writing, and supporting secure communication among AI systems. The project’s educational activities develop course modules and training experiences that prepare students to reason about reliability, security, and trade-offs in generative AI, and they broaden participation through mentored research experiences, community engagement, and outreach activities. This project studies how to embed multi-bit information into text during large language model generation while preserving text quality and providing reliability, robustness, and security guarantees. The research frames large language model watermarking as a distributional information embedding problem in which the generation process is shaped so that the resulting text remains high quality yet carries decodable information. Because generated text is produced sequentially rather than as independent samples, the project develops theory and algorithms that explicitly address this dependence and that measure distortion using distribution-based criteria. The research characterizes fundamental trade-offs among detectability, text quality, information rate, robustness to removal, and resistance to spoofing. These results guide the design of efficient, deployable multi-bit watermarking algorithms with provable performance guarantees, along with authentication mechanisms that help distinguish genuine watermarks from adversarial forgeries and help detect removal attempts. The project also develops practical methods that extend beyond basic AI versus human attribution, including dataset intellectual property protection, in-context watermarking for detecting AI-generated reviews, and secure message passing among large language model-based agents. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2543381 | Program: 01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Yuheng Bu | Institution: University of California-Santa Barbara, SANTA BARBARA, CA | Award Amount: $600,000 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2543381 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2543381.html

CAREER: LLM Watermarking and Beyond: Foundations and Algorithms via Distributional Information Embedding

Description

Interested in this grant?

Grant Details

External Links

Get personalized grant matches