Clustering Mixtures with Almost Optimal Separation in Polynomial Time

Speaker: Jerry Li , Microsoft Research Redmond

Date: Tuesday, May 10, 2022

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G449 (Kiva)

Event Type: Seminar

Room Description: 32-G449 (Kiva)

Host: Sam Hopkins, CSAIL MIT

Contact: Nathan Higgins, nhiggins@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: theory-seminars@lists.csail.mit.edu, seminars@lists.csail.mit.edu

Reminder Subject: TALK: Clustering Mixtures with Almost Optimal Separation in Polynomial Time

We consider the problem of clustering a mixture of Gaussians in high dimensions, one of the most well-studied tasks in clustering and high dimensional distribution learning. In the most basic setting, we receive samples from a mixture of k Gaussians in d dimensions, each with identity covariance, and with the promise that the means of the Gaussians have pairwise separation at least Delta. The goal is to efficiently recover a near-perfect clustering of the samples, in polynomial time and sample complexity. It is folklore that this problem is information-theoretically possible as long as Delta >> sqrt(log k), however, all known algorithmic methods for this regime require at least quasi-polynomial runtime and sample complexity. All of these methods hit the same basic roadblock: it seems that efficient methods for solving this problem need to look at all moments of degree at least polylogarithmically many moments; however, even writing down the moment tensor at that degree requires quasi-polynomial runtime.

In this work, we circumvent this barrier, and present the first fully polynomial time algorithm for this problem. Perhaps surprisingly, our method still goes via the method of moments, but we avoid writing down the full moment tensor using a recursively constructed implicit representation of the moment tensor. This allows us to have restricted access to the moment tensor in polynomial time, which we demonstrate is sufficient to solve the full problem. We also demonstrate that our techniques generalize to the case where the components are all shifts of the same distribution, as long as that distribution satisfies the Poincare inequality.

Research Areas:
Algorithms & Theory

Impact Areas:

See other events that are part of the Theory of Computation (ToC) Seminar 2022.

Created by Nathan Higgins Email at Wednesday, April 13, 2022 at 4:13 PM.