The Statistics Seminar speaker for Wednesday, November 20, 2019, will be Xiaohui Chen, an associate professor in the department of statistics at the University of Illinois at Urbana-Champaign. Chen received a Ph. D. in Electrical and Computer Engineering in 2013 from the University of British Columba (UBC), Vancouver, Canada. He was a post-doctoral fellow at the Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located on the University of Chicago campus. In 2013 he joined the University of Illinois at Urbana-Champaign (UIUC) as an Assistant Professor of Statistics, and since 2019 he is an Associate Professor of Statistics at UIUC. In 2019-2020 he is visiting the Institute for Data, Systems, and Society (IDSS) at Massachusetts Institute of Technology (MIT). He received numerous notable awards, including the NSF CAREER Award in 2018, the Arnold O. Beckman Award at UIUC in 2018, and the ICSA Outstanding Young Researcher Award in 2019.
Talk: Diffusion K-means clustering on manifolds: provable exact recovery via semidefinite relaxations
Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm (i.e., trace norm) regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.