Statistics Seminar Speaker: Pragya Sur, 05/03/2023

Pragya Sur is an Assistant Professor in the Statistics Department at Harvard. Prior to her current position, she was a postdoctoral fellow at the Center for Research on Computation and Society, Harvard SEAS. She obtained her Ph.D. from Stanford Statistics in 2019. Her research spans high-dimensional statistics and machine learning theory, with focus on high-dimensional regression, classification, causal inference, ensemble learning, and learning under distribution shifts. Her research is supported by an NSF DMS Award and a William F. Milton Fund Award (both solo PI). She is an International Strategy Forum (ISF) 2023 Asia Fellow, chosen by Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt. As part of this, she currently participates in an 11-month, non-residential fellowship program for rising leaders ages 25 – 35 from across Africa, Asia, North America, and Europe. In Fall, 2021, she was invited to speak at the National Academies’ Board on Mathematical Sciences and Analytics symposium on Mathematical Challenges for Machine Learning and Artificial Intelligence, and also visited the Simons Institute for the Theory of Computing as a long-term participant. In 2019, she received the Theodore W. Anderson Theory of Statistics Dissertation Award for “deep original results in large sample maximum likelihood theory for logistic regression with a large number of covariates”.

Talk: A new central limit theorem for the augmented IPW estimator in high dimensions

Join via Zoom
Meeting ID: 984 2423 1705
Passcode: 354857

Abstract: Estimating the average treatment effect (ATE) is a central problem in causal inference. Modern advances in the field studied estimation and inference for the ATE in high dimensions through a variety of approaches. Doubly robust estimators such asthe augmented inverse probability weighting (AIPW) form a popular approach in this context. However, the high-dimensional literature surrounding these estimators relies on sparsity conditions, either on the outcome regression (OR) or the propensity score (PS) model. This talk will introduce a new central limit theorem for the classical AIPW estimator, that applies agnostic to such sparsity-type assumptions. Specifically, we will study properties of the cross-fit version of the estimator under well-specified OR and PS models, and the proportional asymptotics regime where the number of confounders and sample size diverge proportional to each other. Under assumptions on the covariate distribution, our CLT will uncover two crucial phenomena among others: (i) the cross-fit AIPW exhibits a substantial variance inflation that can be quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the estimators used while cross-fitting is non-negligible even on the root-n scale. These findings are strikingly different from their classical counterparts, and open a vista of possibilities for studying similar other high-dimensional effects. On the technical front, our work utilizes a novel interplay between three distinct tools—approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach. Time permitting, I will outline some of these techniques.