- Jelani Nelson: Heavy Hitter...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in November 2016
Jelani Nelson: Heavy Hitters via Cluster-Preserving Clustering
Speaker:
Jelani Nelson, Harvard
Date: Tuesday, November 15, 2016
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 3:45 PM
Public: Yes
Location: Patil/Kiva G449
Event Type:
Room Description:
Host: Aleksander Madry
Contact: Deborah Goodwin, 617.324.7303, dlehto@csail.mit.edu
Speaker URL: None
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu, theory-seminars@csail.mit.edu
Reminder Subject:
TALK: Jelani Nelson: Heavy Hitters via Cluster-Preserving Clustering
Abstract: In the "heavy hitters" or "frequent items" problem, one must process a
stream of items and report those items that occur frequently. For
example, a telecommunications company may wish to find popular
destination IP addresses in a packet stream across one of their links,
or a search engine may wish to report popular query words. A more
general problem is when there are two streams and we must report those
items whose frequencies significantly deviate between them; the former
problem is a special case since we can artificially pretend the first
of the two streams the empty stream. Such a problem naturally arises
in trend detection and anomaly detection. Several algorithms were
known in the literature solving these problems, such as the CountMin
sketch, CountSketch, Hierarchical CountSketch and others. The goal in
designing such algorithms is (1) to guarantee finding frequent items
for as lax a definition of "frequent" as possible while still limiting
output size, and while having low (2) memory consumption, (3)
processing time per stream item, (4) query time to report the list of
frequent times, and (5) failure probability. Previous solutions could
perform well on various subsets of these metrics, but not on all 5
simultaneously.
We design a new algorithm, ExpanderSketch, which performs well on all
5. Our main innovation is a novel reduction to a new graph-clustering
problem we formulate, in which finding most of every cluster
guarantees finding all the frequent items. We then solve this problem
by devising a novel spectral clustering algorithm potentially of
independent interest, based on divide-and-conquer and local search.
Research Areas:
Impact Areas:
Created by Deborah Goodwin at Wednesday, November 09, 2016 at 12:41 PM.