Sampling Sketches for Concave Sublinear Functions of Frequencies

Speaker: Ofir Geri , Stanford

Date: Wednesday, May 15, 2019

Time: 3:00 PM to 4:00 PM

Public: Yes

Location: 32-G575

Event Type: Seminar

Room Description:

Host: Govind Ramnarayan, Quanquan Liu, Sitan Chen

Contact: Rebecca Yadegar, ryadegar@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu, theory-seminars@csail.mit.edu

Reminder Subject: TALK: Ofir Geri: Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning.

A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments x^p for 0

Research Areas:
Algorithms & Theory

Impact Areas:

See other events that are part of the Algorithms & Complexity Seminars 2018-2019.

Created by Rebecca Yadegar Email at Thursday, May 02, 2019 at 4:23 PM.