Time: 4:30 - 5:30 p.m.
Date: Wednesday, November 5, 2025
Speaker: Gemma Moran, Assistant Professor of Statistics, Rutgers University
Title: Nonlinear Multi-Study Factor Analysis

A color photo of a woman smiling for a photo.

Abstract: High-dimensional data often exhibit variation that can be captured by lower dimensional factors. For high-dimensional data from multiple studies or environments, one goal is to understand which underlying factors are common to all studies, and which factors are study or environment-specific. As a particular example, we consider platelet gene expression data from patients in different disease groups. In this data, factors correspond to clusters of genes which are co-expressed; we may expect some clusters (or biological pathways) to be active for all diseases, while some clusters are only active for a specific disease. To learn these factors, we consider a nonlinear multi-study factor model, which allows for both shared and specific factors.  To fit this model, we propose a multi-study sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors.  In the genomics example, this means each gene is active in only a few biological processes.  Further, the model implicitly induces a penalty on the number of latent factors, which helps separate the shared factors from the group-specific factors.  We prove that the shared factors are identified, and demonstrate our method recovers meaningful factors in the platelet gene expression data.

Bio: Gemma Moran is an Assistant Professor of Statistics at Rutgers. Previously, she was a postdoc at the Columbia Data Science Institute, working with Dave Blei. She received her PhD in Statistics from the University of Pennsylvania, advised by Ed George and Veronika Rockova.