CRIBB Semainar: Computing on Masked Big Data

Speaker: Jeremy Kepner, Vijay Gadepally, Pete Michaleas, Nabil Schear, and Mayank Varia , MIT-Lincoln LaboratoryMIT-Lincoln Laboratory

Date: Friday, February 07, 2014

Time: 12:00 PM to 1:00 PM Note: all times are in the Eastern Time Zone

Refreshments: 11:45 AM

Public: Yes

Location: 32-141

Event Type:

Room Description:

Host: Professor Alan Edelman, MIT

Contact: Patrice Macaluso, macaluso@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to:

Reminder Subject: TALK: CRIBB Semainar: Computing on Masked Big Data

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these three Vs of big data, an increasingly important fourth challenge is veracity. Big data volume stresses the storage, memory, and compute capacity of a computing system and requires access to a computing cloud. The velocity of big data stresses the rate at which data can be absorbed and meaningful answers produced. Big data variety requires vast quantities of highly diverse data (text, computer logs, and social media data, etc.) to be automatically ingested. Traditional techniques for assuring the veracity of data incur overheads that are often too large to apply to big data, and there is increasing interest in investigating alternative techniques. Computing on Masked Data (CMD) is one such low overhead technique that allows data to be masked, operated on, and then unmasked when the answers are desired. CMD relies on the sparse linear algebra of associative arrays to transform computations from a space where + and * are the primary low-level operations to one where =, >, and < are the primary low-level operations. Databases with strong support of sparse operations (such as SciDB or Apache Accumulo) are ideally suited to this technique. A demonstration of the technique on DNA sequence data shows how DNA data can be masked, a complex DNA matching algorithm can be performed on the masked DNA data, and the result can be unmasked to reveal the true answer. CMD can be performed with significantly less overhead than other approaches while also supporting a full range of linear algebraic operations on the masked data.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Patrice Macaluso Email at Thursday, February 06, 2014 at 3:50 PM.