Clustering Mixtures with Almost Optimal Separation in Polynomial Time
, Microsoft Research Redmond
Date: Tuesday, May 10, 2022
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Location: 32-G449 (Kiva)
Event Type: Seminar
Room Description: 32-G449 (Kiva)
Host: Sam Hopkins, CSAIL MIT
Contact: Nathan Higgins, email@example.com
Speaker URL: None
TALK: Clustering Mixtures with Almost Optimal Separation in Polynomial Time
We consider the problem of clustering a mixture of Gaussians in high dimensions, one of the most well-studied tasks in clustering and high dimensional distribution learning. In the most basic setting, we receive samples from a mixture of k Gaussians in d dimensions, each with identity covariance, and with the promise that the means of the Gaussians have pairwise separation at least Delta. The goal is to efficiently recover a near-perfect clustering of the samples, in polynomial time and sample complexity. It is folklore that this problem is information-theoretically possible as long as Delta >> sqrt(log k), however, all known algorithmic methods for this regime require at least quasi-polynomial runtime and sample complexity. All of these methods hit the same basic roadblock: it seems that efficient methods for solving this problem need to look at all moments of degree at least polylogarithmically many moments; however, even writing down the moment tensor at that degree requires quasi-polynomial runtime.
In this work, we circumvent this barrier, and present the first fully polynomial time algorithm for this problem. Perhaps surprisingly, our method still goes via the method of moments, but we avoid writing down the full moment tensor using a recursively constructed implicit representation of the moment tensor. This allows us to have restricted access to the moment tensor in polynomial time, which we demonstrate is sufficient to solve the full problem. We also demonstrate that our techniques generalize to the case where the components are all shifts of the same distribution, as long as that distribution satisfies the Poincare inequality.
Algorithms & Theory
Created by Nathan Higgins at Wednesday, April 13, 2022 at 4:13 PM.