BEGIN:VCALENDAR
VERSION:2.0
PRODID:icalendar-ruby
CALSCALE:GREGORIAN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
DTSTART:20170312T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20171105T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230203T154636Z
UID:ca4f9c4b-2000-4ee5-961e-2770cd748c49
DTSTART;TZID=America/New_York:20170612T160000
DTEND;TZID=America/New_York:20170612T170000
CREATED:20170609T101041
DESCRIPTION:Abstract: \nStochastic gradient descent (SGD) is the gold stand
ard of optimization in deep learning. It does not\, however\, exploit the
special structure and geometry of the loss functions we wish to optimize\,
viz. those of deep neural networks. In this talk\, we will focus on the g
eometry of the energy landscape at local minima with an aim of understandi
ng the generalization properties of deep networks.\n\nIn practice\, optima
discovered by SGD have a large proportion of almost-zero eigenvalues in t
he Hessian with very few positive or negative eigenvalues. We will first l
everage upon this observation to construct an algorithm named Entropy-SGD
that maximizes a local version of the free energy. Such a loss function fa
vors flat regions of the energy landscape which are robust to perturbation
s and hence more generalizable\, while simultaneously avoiding sharp\, poo
rly-generalizable --- although possibly deep --- valleys. We will discuss
connections of this algorithm with belief propagation and robust ensemble
learning. Furthermore\, we will establish a tight connection between such
non-convex optimization algorithms and nonlinear partial differential equa
tions. Empirical validation on CNNs and RNNs shows that Entropy-SGD and re
lated algorithms compare favorably to state-of-the-art techniques in terms
of both generalization error and training time.\n\narXiv: https://arxiv.o
rg/abs/1611.01838\, https://arxiv.org/abs/1704.04932\n\nBio: \nPratik Chau
dhari is a PhD candidate in Computer Science at UCLA. With his advisor Ste
fano Soatto\, he focuses on optimization algorithms for deep networks. He
holds Master's and Engineer's degrees in Aeronautics and Astronautics from
MIT where he worked on stochastic estimation and randomized motion planni
ng algorithms for urban autonomous driving with Emilio Frazzoli.
LAST-MODIFIED:20170609T101041
LOCATION:32-D507
SUMMARY:A picture of the energy landscape of deep neural networks
URL:https://calendar.csail.mit.edu/events/188221
END:VEVENT
END:VCALENDAR