CDML Seminar: Greg Yang. Title: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Speaker: Greg Yang, Microsoft Research

Date: Thursday, April 14, 2022

Time: 1:00 PM to 2:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32 Vassar St., Stata Bldg, G575

Event Type:

Room Description:

Host: Aleksander Madry

Contact: Deborah Goodwin, dlehto@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: CDML Seminar: Greg Yang. Title: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Abstract:
You can’t train GPT-3 on a single GPU, much less tune its hyperparameters (HPs)…or so it seems. I’m here to tell you this is not true: you can tune its HPs on a single GPU even if you can’t train it that way! In the first half of this talk, I’ll describe how, in the so-called maximal update parametrization (abbreviated µP), narrow and wide neural networks share the same set of optimal HPs. This lets us tune any large model by just tuning a small version of it — we call this µTransfer. In particular, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and, with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In the second half of this talk, I’ll discuss the theoretical reason µP has this special property and the connection to the study of infinite-width neural networks and, more generally, the theory of Tensor Programs. The first half will target general practitioners or empirical researchers in machine learning, while the second half targets those who are more theoretically curious. This talk is based on https://arxiv.org/abs/2203.03466

If you would like to sign up for a 1:1 meeting with Greg Yang: https://docs.google.com/spreadsheets/d/1tuirwZ0ClnX0Wde-g2-LDafaCWqPhUG1AU3TOpavYMQ/edit?usp=sharing

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Deborah Goodwin Email at Friday, March 25, 2022 at 2:54 PM.