Scalable Data Management for High-Throughput Genomics
, The University of Sydney
Date: Thursday, December 12, 2013
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 3:45 PM
Host: Samuel Madden
Contact: Sheila M. Marian, 617-253-1996, firstname.lastname@example.org
Speaker URL: None
TALK: Scalable Data Management for High-Throughput Genomics
With today's DNA sequencing technology, one can sequence an individual genome within a few days for a fraction of the costs of the original Human Genome project (an estimated $3 billion over 10 years). The ultimate goal is the personal genome within a few hours as a hospital lab test, which would revolutionize modern health care and research areas such as cancer and HIV research. This also means that Genomics labs are facing several terrabytes of data per week that have to be efficiently processed.
This talk explores the potential and the current limitations of using database technology for high-throughput genomics. In particular, we are interested in supporting the initial stages of a typical high-throughput DNA sequencing pipeline. The talk gives an overview of the BioSeqDB project, in which we explored the applicability of extensible databases and SQL for declarative processing of bio-data. One specific result was a new efficient algorithm for error-correcting raw sequence data, called Blue, that combines statistical methods and scalable data processing algorithms based on k-mer consensus. Blue outperforms existing error-correction algorithms by up-to two orders of magnitude in throughput while achieving higher accuracy on both Illumina and 454 data.
About the Speaker:
Uwe Roehm is associate professor for database systems at the University of Sydney. He is a computer science graduate from the University of Passau, Germany, and received his doctoral degree in 2002 from ETH Zurich, Switzerland. His research interests are cloud data management, databases on multicore servers, data replication, and data management for bioinformatics.
Created by Sheila M. Marian at Monday, December 09, 2013 at 2:02 PM.