Mark Sellke is an Assistant Professor of Statistics at Harvard. He completed his PhD at Stanford and his undergraduate degree at MIT. Mark's research interests include high-dimensional probability and statistics, optimization, and machine learning. His work has been recognized by the best paper award at SODA 2020 and the outstanding paper award at NeurIPS 2021.
Talk: Nonparametric MLE for Gaussian Location Mixtures: Efficient Approximation and Generic Behavior
Abstract: We study the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixtures in one dimension. It has been known since Lindsay (1983) that given an n-point dataset, this estimator always returns a mixture with at most n components. Recently, Polyanskiy-Wu (2020) gave an optimal logarithmic bound for subgaussian data. In this work we study computational and structural aspects of the NPMLE. We show the number of components, an integer-valued function, is efficiently computable with probability 1 when the data are independent with absolutely continuous law, and lie in an interval of length O(n^{1/4}). Consequently, we are able to compute an epsilon-approximation to the NPMLE in the Wasserstein distance in poly(1/epsilon) time for small epsilon. Along the way, we show the NPMLE exhibits "generic" behavior: conditional on having k atoms, its law admits a density on the appropriate 2k-1 dimensional parameter space for all k\leq sqrt(n)/2. Additionally the KKT conditions of the associated variational problem almost surely hold with strict inequality. A classical Fourier analytic estimate for non-degenerate curves is key to our analysis.