Ruoqi Yu is a fifth year doctoral student from the Department of Statistics at the University of Pennsylvania. Previously, she obtained her bachelor’s degree at the University of Toronto. Her research interests focus on causal inference in observational studies, with applications in public policy, public health and social sciences. In particular, she has been developing new matching methods for large observational studies using tools from discrete optimization.
Talk: "Matching Methods for Observational Studies Derived from Large Administrative Databases"
A link to this Zoom talk will be sent to the Stats Seminar list serv
Abstract: Ideally, people study causal relationships with randomized experiments, which are not always practical or ethical. As such, causal effects are often studied with non-randomized observational studies. Matching in observational studies is a common approach to mimic randomized experiments in the design stage by creating similar treated and control groups for observed covariates.
As technologies have rapidly developed in recent decades, data sets have grown in size while also becoming more accessible for analysis, e.g., electronic health records, medical claims data, educational databases, and social media data. The increasing sample size has posed tremendous challenges to optimal matching in observational studies, which has computation complexity O(N^3). In current practice, very large matched samples are constructed by subdividing the population and solving a series of smaller problems, which can restrict the possible matches in undesirable ways. In the first part of this talk, I propose a single match using everyone in the data set, that accelerates the computations in a different way. In particular, I reduce the number of candidate matches by using an iterative form of Glover’s algorithm for a doubly convex bipartite graph to determine an optimal caliper for the propensity score. In the second part, I discuss how to improve the covariate balance of matched samples effectively using directional penalties. I also explore the connection between directional penalties and a widely used technique in integer programming, namely Lagrangian relaxation of problematic linear side constraints in a minimum cost flow problem. The methods are applied to a data set from US Medicaid with 198,368 surgical admissions to study the causal effects of having surgery at a children’s hospital on children’s mortality within 30 days of surgery.