Time: 4:30-5:30 p.m.
Date: Wednesday, September 10, 2025
Speaker: Florentina Bunea
Title: From softmax mixture ensembles to mixtures of experts, with applications to LLM output summarization
 

A color photo of a woman with a white outline in front of a blurred building.

 

Abstract: Contemporary LLM models have billions of parameters, making them impossible to interpret or use directly in downstream analyses. Summarizing LLM output on the basis of models that strike the balance between complexity and interpretability is therefore of immediate need. This talk will introduce a simple mixture-of-experts (MoE) model for contextually embedded corpora representation. Fitting and analyzing this MoE will be shown to reduce to the analysis of softmax mixture ensemble models, after an appropriate quantization of the corpus into p feature vectors embedded in RL, for large L. Given a collection of discrete data samples, we identify each sample with a distribution in the probability simplex Δp, supported on p points in RL. The softmax mixture ensemble model postulates that the distribution of each sample is a K-mixture of softmax (multinomial logit) distributions, for K ≥ 2, with softmax parameters common to the ensemble, and sample specific weights. Softmax mixture ensembles are of interest in their own right, beyond LLM applications, as they are instances of discrete choice models, widely used in econometrics, among other areas. Despite applicability and increasingly recognized potential, theory and methods for this model class are heavily under-developed. This talk will present solutions to open problems, with a focus on parameter estimation. We provide the first analysis of identifiability in softmax mixtures. Of note is that, for the ensemble model, we can exhibit testable identifiability conditions. We lay the theoretical foundations for parameter estimation in softmax mixtures, by providing the first theoretical analysis of the Expectation-Maximization (EM) algorithm in this model. We give a precise characterization of the size of the initialization neighborhood under which the mixture atoms can be estimated at parametric rates, in a number of iterations that is logarithmic in the sample size. We make use of a novel method-of-moments procedure to estimate the K-dimensional subspace in RL spanned by the K mixture parameters. As a corollary, we show that EM with random start drawn from this estimated sub-space leads to optimal atom estimators in only expK (relative to the typical, huge, expL ) draws. As an important feature of our analysis, we further show that the cross-entropy estimates of the mixture weights are exactly sparse, without need for extra regularization, and we also provide onestep corrected mixture weight estimates that are asymptotically normal, and thus amenable to statistical inference. The totality of these results provides a solution to the LLM representation problem. The estimated mixture atoms, that are common to the corpus, and the estimated mixture weights from each document, readily yield an estimated mixing measure that can serve as a sample-level (document) summary, while their average yields a corpus-level summary. These summaries can be used in downstream tasks involving LLM output evaluation and comparison, with the added advantage of parameter interpretability, typically lacking in existing summarization strategies. I will illustrate these theoretical and methodological 1 2 results using a running data example that clarifies the net benefits of the MoE-type representation of an LLM embedded corpus, relative to a standard topic model-type representation of the same corpus, noting that, by definition, topic models cannot make use of contextual text embedding.

Bio: Florentina Bunea is a professor of statistics and data science and a member of the graduate fields of statistics, applied mathematics, and computer science. Her research is broadly centered on statistical machine learning theory and high-dimensional statistical inference. She is interested in developing new methodology accompanied by sharp theory for solving a variety of problems in data science and in the growing area of AI output evaluation. She continues to be interested in the general areas of mixture modeling, latent space estimation, sparsity and dimension reduction in high dimensions, and statistical optimal transport, as well as their applications, most recently to large language models and immunology, among others.  

She is a fellow of the Institute of Mathematical Statistics (IMS) and an IMS Medallion Award recipient. She has served or is currently serving as an associate editor for a number of journals, including the Annals of Statistics, Bernoulli, JASA, JRSS-B, EJS, and the Annals of Applied Statistics. She is a co-editor for the Chapman and Hall Statistics and Applied Probability Monograph Series. She is also a member of Cornell Bowers’ Diversity, Equity, Inclusion, and Belonging (DEIB) council, working to promote the diversity of the workforce in data science disciplines.