Time: 4:30-5:30 p.m.
Date: Wednesday, February 4, 2026
Speaker: Yihong Gu, Harvard University
Title: Causality pursuit from heterogeneous environments

Abstract: Over the past decade, modern machine learning—especially deep learning and large language models—has achieved remarkable predictive performance. From a statistical perspective, however, standard training objectives minimize population risk under the observed data distribution, so the resulting predictors may rely on spurious associations that fail to generalize under distribution shift and can misrepresent causal relationships.
In this talk, I present a framework for addressing this challenge when domain knowledge is limited. The key idea is to replace pure risk minimization with a refined objective that searches for a set of variables whose predictive relationship with the outcome remains stable across heterogeneous environments, while still controlling prediction error. I will introduce a scalable estimation procedure that is compatible with modern machine learning models, including neural networks, and discuss practical algorithms that enable efficient computation at scale. On the theoretical side, I will characterize the sample efficiency of the method, give a causal interpretation of the refined target and its connection to identifying direct causes in causal discovery, and establish the intrinsic computational barriers.
Bio: Yihong Gu is a postdoctoral researcher at Harvard University. He obtained his Ph.D. from the Department of Operations Research and Financial Engineering at Princeton University in 2025, advised by Professor Jianqing Fan. His research focuses on bridging statistics with modern machine learning techniques and learning causal and reliable relations under minimal prior supervision.