Understanding causal mechanisms is a primary goal of scientific inquiry and an increasingly important objective in modern machine learning and artificial intelligence. In contrast to “classical” causal inference, where the goal is to quantify the causal effect of a pre-specified treatment on a pre-specified outcome, this research will focus on causal discovery, which considers entire complex systems and seeks to identify possible causal relationships among many variables simultaneously. Causal discovery has growing relevance in AI and machine learning (ML) applications that require interpretability, robustness, and scientific reasoning from high-dimensional data. For instance, biologists may apply causal discovery to infer causal structure in intracellular networks, neuroscientists may apply causal discovery to recover causal relationships between brain regions, and ML researchers may use these methods to build more reliable predictive models and foundation models that better capture underlying data-generating processes. Despite many exciting theoretical advances and some promising initial applications, important challenges still hinder the widespread use of causal discovery in empirical research and AI systems. This project will develop theory and methods that broaden the use of causal discovery in the empirical sciences and machine learning, enabling researchers across disciplines to apply these methods in a practical, trustworthy, and reliable manner. Most notably, the research will enable practitioners to rigorously reason about uncertainty in estimated causal structure. The project will also include activities to communicate causal discovery tools to end users in an accessible way, as well as intentional education and training for students at both the undergraduate and graduate levels. The project will primarily consider causal models that can be represented as directed graphs; thus, the goal of causal discovery is to select the causal graph that corresponds to the data-generating mechanism. The research will consist of three main components. The first component will develop methods to compute frequentist confidence sets for causal structure in settings that combine experimental and observational data, including settings commonly encountered in modern machine learning pipelines. These methods will also allow practitioners to adaptively select experiments while maintaining formal coverage guarantees, with potential applications to active learning and sequential decision-making systems. The second component will develop methods that account for uncertainty in the causal graph when estimating a causal effect. When the causal graph is unknown, and the estimand is a specific causal effect, the causal graph is a nuisance parameter. This component will yield methods that rigorously measure the total uncertainty in a causal effect by also accounting for uncertainty in the graph, thereby improving the reliability and interpretability of downstream AI and ML models. Finally, the third component will develop causal discovery methods for data subject to measurement error or generated from mixture models, including high-dimensional and heterogeneous datasets frequently encountered in contemporary AI applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. NSF Award ID: 2542851 | Program: 01003031DB NSF RESEARCH & RELATED ACTIVIT,01002930DB NSF RESEARCH & RELATED ACTIVIT,01002627DB NSF RESEARCH & RELATED ACTIVIT | Principal Investigator: Yu-hsuan Wang | Institution: Cornell University, ITHACA, NY | Award Amount: $303,663 View on NSF Award Search: https://www.nsf.gov/awardsearch/show-award/?AWD_ID=2542851 View on Research.gov: https://www.research.gov/awardapi-service/v1/awards/2542851.html

CAREER: Causal Discovery: Methodology Driven by Practice

Description

Interested in this grant?

Grant Details

View the application link

Get personalized grant matches