Jelani Nelson: Heavy Hitters via Cluster-Preserving Clustering

Speaker: Jelani Nelson, Harvard

Date: Tuesday, November 15, 2016

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Refreshments: 3:45 PM

Public: Yes

Location: Patil/Kiva G449

Event Type:

Room Description:

Host: Aleksander Madry

Contact: Deborah Goodwin, 617.324.7303, dlehto@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu, theory-seminars@csail.mit.edu

Reminder Subject: TALK: Jelani Nelson: Heavy Hitters via Cluster-Preserving Clustering

Abstract: In the "heavy hitters" or "frequent items" problem, one must process a
stream of items and report those items that occur frequently. For
example, a telecommunications company may wish to find popular
destination IP addresses in a packet stream across one of their links,
or a search engine may wish to report popular query words. A more
general problem is when there are two streams and we must report those
items whose frequencies significantly deviate between them; the former
problem is a special case since we can artificially pretend the first
of the two streams the empty stream. Such a problem naturally arises
in trend detection and anomaly detection. Several algorithms were
known in the literature solving these problems, such as the CountMin
sketch, CountSketch, Hierarchical CountSketch and others. The goal in
designing such algorithms is (1) to guarantee finding frequent items
for as lax a definition of "frequent" as possible while still limiting
output size, and while having low (2) memory consumption, (3)
processing time per stream item, (4) query time to report the list of
frequent times, and (5) failure probability. Previous solutions could
perform well on various subsets of these metrics, but not on all 5
simultaneously.

We design a new algorithm, ExpanderSketch, which performs well on all
5. Our main innovation is a novel reduction to a new graph-clustering
problem we formulate, in which finding most of every cluster
guarantees finding all the frequent items. We then solve this problem
by devising a novel spectral clustering algorithm potentially of
independent interest, based on divide-and-conquer and local search.

Research Areas:

Impact Areas:

See other events that are part of the Theory of Computation (TOC) Seminar Series 2016.

Created by Deborah Goodwin Email at Wednesday, November 09, 2016 at 12:41 PM.