BEGIN:VCALENDAR
VERSION:2.0
PRODID:icalendar-ruby
CALSCALE:GREGORIAN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
DTSTART:20190310T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20191103T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200601T190353Z
UID:41de567b-e218-40c1-ba59-5a23416a6d9a
DTSTART;TZID=America/New_York:20190515T150000
DTEND;TZID=America/New_York:20190515T160000
CREATED:20190502T162316
DESCRIPTION:We consider massive distributed datasets that consist of elemen
ts that are key-value pairs. Our goal is to compute estimates of statistic
s or aggregates over the data\, where the contribution of each key is weig
hted by a function of its frequency (sum of values of its elements). This
fundamental problem has a wealth of applications in data analytics and mac
hine learning.\n\nA common approach is to maintain a sample of keys and es
timate statistics from the sample. Ideally\, to obtain low-variance estima
tes we sample keys with probabilities proportional to their contributions.
One simple way to do so is to first aggregate the raw data to produce a t
able of keys and their frequencies\, apply our function to the frequency v
alues\, and then apply a weighted sampling scheme. This aggregation howeve
r requires data structures of size proportional to the number of distinct
keys and is too costly when the number is very large. Our main contributio
n is the design of composable sampling sketches that can be tailored to an
y concave sublinear function of the frequencies (including log\, the momen
ts x^p for 0