ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Speaker: Michael W. Mahoney , ICSI and Department of Statistics, UC Berkeley

Date: Monday, September 28, 2020

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, if you haven't registered for this series before)

Event Type: Seminar

Room Description: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, if you haven't registered for this series before)

Host: Julian Shun, MIT CSAIL

Contact: Julian Shun, jshun@mit.edu, lindalynch@csail.mit.edu

Relevant URL: http://fast-code.csail.mit.edu/

Speaker URL: https://www.stat.berkeley.edu/~mmahoney/

Speaker Photo:
None

Reminders to: fast-code-seminar@lists.csail.mit.edu, seminars@csail.mit.edu, pl@csail.mit.edu, commit@lists.csail.mit.edu, mitml@mit.edu

Reminder Subject: TALK: ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Abstract: Second order optimization algorithms have a long history in scientific computing, but they tend not to be used much in machine learning. This is in spite of the fact that they gracefully handle step size issues, poor conditioning problems, communication-computation tradeoffs, etc., all problems which are increasingly important in large-scale and high performance machine learning. A large part of the reason for this is that their implementation requires some care, e.g., a good implementation isn't possible in a few lines of python after taking a data science boot camp, and that a naive implementation typically performs worse than heavily parameterized/hyperparameterized stochastic first order methods. We describe ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian. ADAHESSIAN includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. Extensive tests on natural language processing, computer vision, and recommendation system tasks demonstrate that ADAHESSIAN achieves state-of-the-art results. The cost per iteration of ADAHESSIAN is comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values.

Bio: Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is currently the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. He holds several patents for work done at Yahoo Research and as Lead Data Scientist for Vieu Labs, Inc., a startup reimagining consumer video for billions of users. More information is available at https://www.stat.berkeley.edu/~mmahoney/.

IMPORTANT NOTE FOR ATTENDEES: If you have already registered for the Fast Code Seminars on Zoom since July 27, 2020, please use the Zoom link that you have received. This link will stay the same for subsequent Fast Code seminars this semester. Zoom does not recognize a second registration, and will not send out the link a second time. If you have any problems with registration, please contact jshun@mit.edu and lindalynch@csail.mit.edu by 1:30pm on the day of the seminar, so that we can try to resolve it before the seminar begins.

Research Areas:
Algorithms & Theory, AI & Machine Learning

Impact Areas:
Big Data

See other events that are part of the Fast Code 2020 - 2021.

Created by Julian J. Shun Email at Wednesday, September 23, 2020 at 12:07 PM.