Massive-Scale Processing of Record-Oriented and Graph Data

Speaker: Samih Salihoglu , Stanford InfoLab, Stanford University

Date: Monday, April 06, 2015

Time: 10:00 AM to 11:00 AM Note: all times are in the Eastern Time Zone

Refreshments: 9:45 AM

Public: Yes

Location: 32-D463

Event Type:

Room Description:

Host: Sam Madden and Michael Stonebraker, Advanced Network Architecture group, CSAIL, MIT

Contact: Sheila M. Marian, 617-253-1996, sheila@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Massive-Scale Processing of Record-Oriented and Graph Data

Abstract: Starting with MapReduce and its open-source version Hadoop, numerous distributed data-processing systems have been developed over the last decade, including key-value stores, stream processors, graph and machine learning systems, and scalable relational engines. While these new systems enable users to process various forms of data at scale, they also raise fundamental systems and theory questions about data processing in highly parallel, shared-nothing clusters. What is the right theoretical framework for understanding the costs of distribution and comparing different algorithms? Which algorithms, data structures, languages, debuggers, and testing tools should we use to program these systems.

In this talk, I will discuss some of my work in two of these areas. The first part presents a new theoretical framework to understand the costs of solving problems at different parallelism levels, based on a fundamental tradeoff between cluster machine sizes and communication. I will give examples of this tradeoff from the age-old problem of database joins within the context of MapReduce, and will discuss how to analyze the optimality of different algorithms in light of this tradeoff. The second part of the talk touches on my more system-oriented work on making it easier to program and debug distributed graph processing systems, such as Pregel and its open-source variants, Giraph and GPS. I will present a debugger called Graft, which we have built for the Apache Giraph system.

Bio: Semih Salihoglu is a PhD Student working with Prof. Jennifer Widom in the Stanford InfoLab. He received his undergraduate degree from Yale University, and worked for three years as a software engineer at Google before joining Stanford's PhD program. His PhD research has been supported by fellowships from Google and VMWare.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Sheila M. Marian Email at Friday, April 03, 2015 at 9:37 AM.