Time: 4:30-5:30 p.m.
Date: Wednesday, November 19, 2025
Speaker: Ilya Shpitser, John C. Malone Associate Professor, John Hopkins Whiting School of Engineering
Title: Graphical models for missing data not at random: identification, inference, and imputation

Abstract: Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data rely on missing data models represented as a factorization with respect to a graph. I describe a simple characterization of all identified missing data models where the full data distribution factorizes with respect to a directed acyclic graph (DAG). We show how statistical inference may be performed within a maximum likelihood and semi-parametric frameworks in this class of models. In addition, we discuss how Markov restrictions in his model class naturally lead to an imputation procedure analogues to Gibbs sampling procedures for the missing at random model, such as MICE and Amelia, while allowing imputation even in high dimensional settings where many missingness patterns have no support.
This is joint work with Rohit Bhattacharya, Razieh Nabi, Trung Phung, and Kyle Reese.
Bio: Ilya Shpitser, a John C. Malone Associate Professor in the Johns Hopkins University Department of Computer Science, works on causal and semiparametric inference, missing data, and algorithmic fairness—ubiquitous data complications that may arise in datasets of all types, such as those obtained from social networks, electronic medical records, criminal justice databases, or longitudinal studies. Shpitser is also a member of the Data Science and AI Institute.
His methods yield principled approaches to detecting and addressing disparities and algorithmic bias, understanding causal pathways, and making appropriate causal inferences in settings where observations are systematically censored, unobserved confounders are present, observed realizations are correlated, or the problem is sufficiently complex that simple parametric approaches are unrealistic. The goal of his work is to allow inferences about cause-effect relationships to be made from complex, high-dimensional observational data, which is a crucial task in the empirical sciences and rational decision-making.