Making Your Clean Data Big: Scalable Algorithms for Data Quality Problems
, AT&T - Shannon Labs
Date: Thursday, January 16, 2014
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Host: Samuel Madden
Contact: Sheila M. Marian, 617-253-1996, email@example.com
Speaker URL: None
TALK: Making Your Clean Data Big: Scalable Algorithms for Data Quality Problems
In our Big Data era, data is being generated,
collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor
quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth V of big data is increasingly being recognized.
In this talk, we will consider two scenarios of data quality issues arising in infrastructure network data and web XML data. We will
show how one can develop fast near-linear time algorithms to detect these data errors and/or repair them efficiently.
Bio: Barna Saha is a Senior Member of Technical Staff Research at AT&T-Shannon Labs where she joined after her graduation from University of Maryland College Park in fall 2011. Her research interests span algorithm design and analysis, discrete optimization and foundational aspects of databases and data management. She received Dean's Dissertation Fellowship Award from University of Maryland, College Park for excellence in Ph.D. research and the Academic Excellence Award from Indian Institute of Technology, Kanpur. She is the recipient of the best paper award at Very Large Data Bases Conference (VLDB'09) for her work on uncertain ranking.
Created by Sheila M. Marian at Friday, January 10, 2014 at 2:40 PM.