Yoram Singer: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning
Yoram Singer, Princeton University
Date: Tuesday, April 23, 2019
Time: 4:00 PM to 5:00 PM
Location: Patil/Kiva G449 (Gates Bldg, Stata)
Event Type: Seminar
Host: Aleksander Madry
Contact: Deborah Goodwin, 617.324.7303, email@example.com
Speaker URL: None
TALK: Yoram Singer: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training some of the largest deep models used at Google.
Yoram Singer is the head of Principles Of Effective Machine-learning (POEM) research group in Google Brain and a professor of Computer Science at Princeton. He was a member of the technical staff at AT&T Research from
1995 through 1999 and an associate professor at the Hebrew University from 1999 through 2007. He is a fellow of AAAI. His research on machine learning algorithms received several awards.
Created by Deborah Goodwin at Thursday, April 18, 2019 at 11:00 AM.