ML Efficiency for Large Models: Faster Transformers, Sparsity, and beyond

Speaker: Vahab Mirrokni , Google Research

Date: Thursday, April 11, 2024

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-D507

Event Type: Seminar

Room Description: 32-D507

Host: Noah Golowich, MIT

Contact: Noah Golowich,

Relevant URL:

Speaker URL:

Speaker Photo:

Reminders to:,

Reminder Subject: TALK: Vahab Mirrokni: ML Efficiency for Large Models: Faster Transformers, Sparsity, and beyond

*Updated time -- 2-3pm (previous time was incorrect)*

Abstract: Scaling large models efficiently for faster training and inference is a fundamental challenge. In this talk, we present a number of algorithmic challenges and potential solutions from theory to practice. First, we discuss data efficiency and model efficiency problems that can be formalized as a subset selection problem. For model efficiency, we present sequential attention for feature selection and sparsification[ICLR'23, arxiv]. For data efficiency, we present a sensitivity sampling technique for improved quality and efficiency of the models. Furthermore, we discuss the intrinsic quadratic complexity of attention models as well as token generation. We first discuss HyperAttention; a technique to develop linear-time attention algorithms under mild assumptions[ICLR'24]. We then present PolySketchFormer, a technique to bypass the hardness results of achieving sub-quadratic attention by applying sketching of polynomial functions[arxiv]. Finally, we show how to address the complexity of token generation via clustering techniques[arxiv].

Bio: Vahab Mirrokni is a Google Fellow and VP of Research at Google New York, leading a number of algorithm and optimization research groups including market algorithms, large-scale graph mining, and large-scale optimization. Previously he was a distinguished scientist and senior research director at Google. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after research positions at Microsoft Research, MIT and He is the co-winner of best paper awards at KDD, ACM EC, SODA, and Informs Revenue Management. His research areas include algorithms, ML optimization, and computational economics. Recently he has been working on algorithmic problems in the space of ML efficiency, online advertising, and LLMs. His publications by year can be found here.

Research Areas:
Algorithms & Theory

Impact Areas:
Big Data

See other events that are part of the Algorithms and Complexity Seminar 2024.

Created by Noah Golowich Email at Tuesday, April 09, 2024 at 1:48 PM.