- Scalable Data Management fo...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in December 2013
Scalable Data Management for High-Throughput Genomics
Speaker:
Uwe Roehm
, The University of Sydney
Date: Thursday, December 12, 2013
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 3:45 PM
Public: Yes
Location: 32-D463
Event Type:
Room Description:
Host: Samuel Madden
Contact: Sheila M. Marian, 617-253-1996, sheila@csail.mit.edu
Speaker URL: None
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu
Reminder Subject:
TALK: Scalable Data Management for High-Throughput Genomics
Abstract:
With today's DNA sequencing technology, one can sequence an individual genome within a few days for a fraction of the costs of the original Human Genome project (an estimated $3 billion over 10 years). The ultimate goal is the personal genome within a few hours as a hospital lab test, which would revolutionize modern health care and research areas such as cancer and HIV research. This also means that Genomics labs are facing several terrabytes of data per week that have to be efficiently processed.
This talk explores the potential and the current limitations of using database technology for high-throughput genomics. In particular, we are interested in supporting the initial stages of a typical high-throughput DNA sequencing pipeline. The talk gives an overview of the BioSeqDB project, in which we explored the applicability of extensible databases and SQL for declarative processing of bio-data. One specific result was a new efficient algorithm for error-correcting raw sequence data, called Blue, that combines statistical methods and scalable data processing algorithms based on k-mer consensus. Blue outperforms existing error-correction algorithms by up-to two orders of magnitude in throughput while achieving higher accuracy on both Illumina and 454 data.
About the Speaker:
Uwe Roehm is associate professor for database systems at the University of Sydney. He is a computer science graduate from the University of Passau, Germany, and received his doctoral degree in 2002 from ETH Zurich, Switzerland. His research interests are cloud data management, databases on multicore servers, data replication, and data management for bioinformatics.
Research Areas:
Impact Areas:
Created by Sheila M. Marian at Monday, December 09, 2013 at 2:02 PM.