Graph Mining and Data Efficiency@Scale: From scalability to ML applications

Speaker: Vahab Mirrokni

Date: Tuesday, April 09, 2024

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G449 (Kiva)

Event Type: Seminar

Room Description: 32-G449 (Kiva)

Host: Julian Shun, MIT CSAIL

Contact: Julian Shun,

Relevant URL:

Speaker URL:

Speaker Photo:

Reminders to:,,,,,

Reminder Subject: TALK: Graph Mining and Data Efficiency@Scale: From scalability to ML applications

Abstract: In this talk, I will first summarize our long journey in developing a scalable graph-based learning and data mining library at Google. This part of the talk covers problems in learning similarity graphs and hierarchical clustering at scale and will highlight the significance of combining ML, systems, and algorithmic techniques in designing such large-scale distributed and parallel data mining systems. I will then describe a number of recent applications in the space of data efficiency and data curation for Machine Learning models including representative selection for ML efficiency, and sensitivity sampling to improve quality of LLM models.

Bio: Vahab Mirrokni is a Google Fellow and VP of Research at Google New York, leading a number of algorithm and optimization research groups including market algorithms, large-scale graph mining, and large-scale optimization. Previously he was a distinguished scientist and senior research director at Google. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after research positions at Microsoft Research, MIT and He is the co-winner of best paper awards at KDD, ACM EC, SODA, and Informs Revenue Management. His research areas include algorithms, ML optimization, and computational economics. Recently he has been working on algorithmic problems in the space of ML efficiency, online advertising, and LLMs. His full publication list by year can be found here.

Research Areas:
Algorithms & Theory, AI & Machine Learning, Programming Languages & Software, Systems & Networking

Impact Areas:
Big Data

This event is not part of a series.

Created by Julian J. Shun Email at Saturday, April 06, 2024 at 10:29 AM.