Making Your Clean Data Big: Scalable Algorithms for Data Quality Problems

Speaker: Barna Saha , AT&T - Shannon Labs

Date: Thursday, January 16, 2014

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-141

Event Type:

Room Description:

Host: Samuel Madden

Contact: Sheila M. Marian, 617-253-1996, sheila@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Making Your Clean Data Big: Scalable Algorithms for Data Quality Problems

Abstract:
In our Big Data era, data is being generated,
collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor
quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth ‘V’ of big data is increasingly being recognized.

In this talk, we will consider two scenarios of data quality issues arising in infrastructure network data and web XML data. We will
show how one can develop fast near-linear time algorithms to detect these data errors and/or repair them efficiently.

Bio: Barna Saha is a Senior Member of Technical Staff Research at AT&T-Shannon Labs where she joined after her graduation from University of Maryland College Park in fall 2011. Her research interests span algorithm design and analysis, discrete optimization and foundational aspects of databases and data management. She received Dean's Dissertation Fellowship Award from University of Maryland, College Park for excellence in Ph.D. research and the Academic Excellence Award from Indian Institute of Technology, Kanpur. She is the recipient of the best paper award at Very Large Data Bases Conference (VLDB'09) for her work on uncertain ranking.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Sheila M. Marian Email at Friday, January 10, 2014 at 2:40 PM.