Batch Normalization Causes Gradient Explosion in Deep Randomly Initialized Networks

Speaker: Greg Yang , Microsoft Research

Date: Wednesday, May 01, 2019

Time: 4:00 PM to 5:00 PM

Public: Yes

Location: 32-G575

Event Type: Seminar

Room Description:

Host: Govind Ramnarayan, Quanquan Liu, Sitan Chen, MIT CSAIL

Contact: Rebecca Yadegar, , ryadegar@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu, theory-seminars@csail.mit.edu

Reminder Subject: TALK: Greg Yang: Batch Normalization Causes Gradient Explosion in Deep Randomly Initialized Networks

Abstract: Batch Normalization (batchnorm) has become a staple in deep learning since its introduction in 2015. The authors conjectured that “Batch Normalization may lead the layer Jacobians to have singular values close to 1” and recent works suggest it benefits optimization by smoothing the optimization landscape during training. We disprove the “Jacobian singular value” conjecture for randomly initialized networks, showing batchnorm causes gradient explosion that is exponential in depth. This implies that at initialization, batchnorm in fact “roughens” the optimization landscape. This explosion empirically prevents one from training relu networks with more than 50 layers without skip connection. We discuss several ways of mitigating this explosion and their relevance in practice.

Research Areas:
Algorithms & Theory

Impact Areas:

See other events that are part of the Algorithms & Complexity Seminars 2018-2019.

Created by Rebecca Yadegar Email at Thursday, April 18, 2019 at 2:35 PM.