Analytical Bootstrap Method for Fast Error Estimation in Approximate Query Processing

Speaker: Kai Zeng , UCLA

Date: Wednesday, January 08, 2014

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Refreshments: 3:45 PM

Public: Yes

Location: 32-G449

Event Type:

Room Description:

Host: Samuel Madden and Michael Stonebraker

Contact: Sheila M. Marian, 617-253-1996, sheila@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Analytical Bootstrap Method for Fast Error Estimation in Approximate Query Processing

Abstract:

Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)—an area of research that is now made more critical by the need for timely and cost-effective analytics over “Big Data”. Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the bootstrap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead.

In this talk, I will introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on this probabilistic framework, we develop efficient algorithms, namely analytical bootstrap, to predict the error distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that analytical bootstrap has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Sheila M. Marian Email at Tuesday, January 07, 2014 at 11:39 AM.