Skip to main content
Cornell university
Cornell Statistics and Data Science Cornell Statistics and Data Science
  • About Us

    About Us
    Cornell's Department of Statistics and Data Science offers four programs at the undergraduate and graduate levels. Undergraduates can earn a BA in statistical science, social...

    Welcome to the Department of Statistics and Data Science
    History
    Facilities
    Statistics Graduate Society
    Recently Published Papers
  • Academics

    Academics

    Undergraduate
    PhD
    MPS
    PhD Minor in Data Science
    Courses & Course Enrollment
  • People

    People

    Faculty
    Field Faculty
    PhDs
    Emeritus Faculty
    Academic Staff
    Staff
    Research Areas of Expertise
    Statistical Consultants
  • News and Events

    News and Events

    Events
    News
  • Resources

    Resources

    Professional Societies and Meetings
    Affiliated Groups
    Career Services
    Cornell Statistical Consulting Unit
  • Alumni

    Alumni
    Cornell's Statistics and Data Science degrees prepare students for a wide variety of careers, from academia to industry.  See the After Graduation page for a general overview of...

    Alumni Profiles

Search form

You are here

  1. Home 
  2. Events 
  3. Statistics Seminars

Statistics Seminar Speaker: Nhat Ho, 2/12/2020

Event Layout

Wednesday Feb 12 2020

Statistics Seminar Speaker: Nhat Ho, 2/12/2020

4:15pm @ G01 Biotechnology
In Statistics Seminars

The Statistics Seminar speaker for Wednesday, February 12, 2020, is Nhat Ho, a postdoctoral fellow in the Electrical Engineering and Computer Science (EECS) Department where he is supervised by Professor Michael I. Jordan and Professor Martin J. Wainwright. Before going to Berkeley, he finished his Ph.D. degree in 2017 at the Department of Statistics, University of Michigan, Ann Arbor where he was advised by Professor Long Nguyen and Professor Ya’acov Ritov. His current research focuses on the interplay of four principles of statistics and data science: heterogeneity of data, interpretability of models, stability, and scalability of optimization and sampling algorithms.

Talk: Statistical and computational perspectives on latent variable models

Abstract: The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Latent variable models provide a principled approach to modeling heterogeneous collections of data. However, due to the over-parameterization, it has been observed that parameter estimation and latent structures of these models have non-standard statistical and computational behaviors. In this talk, we provide new insights into these behaviors under mixture models, a building block of latent variable models.

From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models based on Wasserstein distance. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions.

From the computational side, we study the non-asymptotic behavior of the EM algorithm under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance O((d/n)1/4) from the true parameter after O((n/d)1/2) steps where d is the dimension.

From the methodological standpoint, we develop computationally efficient optimization-based methods for the multilevel clustering problem based on Wasserstein distance. Experimental results with large-scale real-world datasets demonstrate the flexibility and scalability of our approaches. If time allows, we further discuss a novel post-processing procedure, named Merge-Truncate-Merge algorithm, to determine the true number of components in a wide class of latent variable models.

Event Categories

  • Statistics Seminars
  • Special Events

Image Gallery

Nhat Ho
  • Home
  • About Us
  • Contact Us
  • Careers
© Cornell University Department of Statistics and Data Science

1198 Comstock Hall, 129 Garden Ave., Ithaca, NY 14853

Social Menu

  • Facebook
  • Twitter
  • YouTube
Cornell Bowers CIS College of Computing and Information Science Cornell CALS ILR School

If you have a disability and are having trouble accessing information on this website or need materials in an alternate format, contact web-accessibility@cornell.edu for assistance.