Sampling Sketches for Concave Sublinear Functions of Frequencies
Date: Wednesday, May 15, 2019
Time: 3:00 PM to 4:00 PM
Event Type: Seminar
Host: Govind Ramnarayan, Quanquan Liu, Sitan Chen
Contact: Rebecca Yadegar, firstname.lastname@example.org
Speaker URL: None
TALK: Ofir Geri: Sampling Sketches for Concave Sublinear Functions of Frequencies
We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning.
A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments x^p for 0
Algorithms & Theory
Created by Rebecca Yadegar at Thursday, May 02, 2019 at 4:23 PM.