Cameron Musco: Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Speaker: Cameron Musco

Date: Wednesday, April 01, 2015

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G575

Event Type:

Room Description:

Host: Ilya Razenshteyn

Contact: Rebecca Yadegar,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:,

Reminder Subject: TALK: Cameron Musco:Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Abstract: We show how to approximate a data matrix with a much smaller sketch that can be used to solve a general class of constrained k-rank approximation problems. Importantly, this class of problems includes k-means clustering and unconstrained low rank approximation (i.e. principle component analysis). By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems.

For k-means dimensionality reduction, we provide the first (1 + epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate PCA, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only ‘cover’ a good subspace for A, but can be used directly to compute this subspace. Finally, for k-means clustering, we show how to achieve a (9 +epsilon) approximation by randomly projecting data points to just O(log k/epsilon^2) dimensions.

In this talk I will discuss the view of k-means clustering as constrained low rank approximation, clarify connections between k-means clustering and PCA, overview our main proof approach, and explain the applications of our results.

Joint work with Michael Cohen, Sam Elder, Chris Musco, and Madalina Persu.

Research Areas:

Impact Areas:

See other events that are part of the Algorithms and Complexity Series Fall 2014 / Spring 2015.

Created by Rebecca Yadegar Email at Monday, March 30, 2015 at 2:34 PM.