- Harry Lang: Coresets for k-...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in July 2017
Harry Lang: Coresets for k-Means-Type Problems on Streams
Speaker:
Harry Lang
, Johns Hopkins University
Date: Wednesday, July 05, 2017
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Public: Yes
Location: 32-G575
Event Type:
Room Description:
Host: Pritish Kamath and Akshay Degwekar, CSAIL MIT
Contact: Rebecca Yadegar, ryadegar@csail.mit.edu
Speaker URL: None
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu, theory-seminars@csail.mit.edu
Reminder Subject:
TALK: RESCHEDULED: July 5, 2017: Harry Lang: Coresets for k-Means-Type Problems on Streams
Abstract: Let f be a non-negative symmetric dissimilarity measure. Given sets of points P (the input data) and C (a "query"), we define F(P,C) = \sum_{p \in P} \min_{c \in C} f(p,c). An example that fits into this framework is k-means clustering where f = distance^2 and we restrict |C| = k.
An $\epsilon$-coreset for P is a set Q such that F(Q,C) is within a factor of $(1 + \epsilon)$ of F(P,C) for any query C. For data reduction, the goal is to construct a coreset Q of minimal cardinality. We introduce improved coreset constructions for a wide class of functions that obey a weak form of the triangle inequality: this includes k-means, k-median, Tukey's bi-weight function, and the Huber loss function.
In the streaming model, the input P (a set of n points) arrives sequentially and algorithms attempt to maintain a coreset using a minimal amount of memory. We introduce a new technique for maintaining coresets in the streaming model with O(1) overhead. For example, the previous state-of-the-art for maintaining an $\epsilon$-coreset for metric k-means on a stream required the storage of $O(\epsilon^{-4} k \log k \log^6 n)$ points whereas our method requires $O(\epsilon^{-2} k \log k \log n)$ points.
Joint work with Vladimir Braverman and Dan Feldman.
Research Areas:
Impact Areas:
Created by Rebecca Yadegar at Monday, June 05, 2017 at 4:50 PM.