ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Michael W. Mahoney
, ICSI and Department of Statistics, UC Berkeley
Date: Monday, September 28, 2020
Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone
Location: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, if you haven't registered for this series before)
Event Type: Seminar
Room Description: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, if you haven't registered for this series before)
Host: Julian Shun, MIT CSAIL
Contact: Julian Shun, email@example.com, firstname.lastname@example.org
Relevant URL: http://fast-code.csail.mit.edu/
Speaker URL: https://www.stat.berkeley.edu/~mmahoney/
email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
TALK: ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Abstract: Second order optimization algorithms have a long history in scientific computing, but they tend not to be used much in machine learning. This is in spite of the fact that they gracefully handle step size issues, poor conditioning problems, communication-computation tradeoffs, etc., all problems which are increasingly important in large-scale and high performance machine learning. A large part of the reason for this is that their implementation requires some care, e.g., a good implementation isn't possible in a few lines of python after taking a data science boot camp, and that a naive implementation typically performs worse than heavily parameterized/hyperparameterized stochastic first order methods. We describe ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian. ADAHESSIAN includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. Extensive tests on natural language processing, computer vision, and recommendation system tasks demonstrate that ADAHESSIAN achieves state-of-the-art results. The cost per iteration of ADAHESSIAN is comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values.
Bio: Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is currently the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. He holds several patents for work done at Yahoo Research and as Lead Data Scientist for Vieu Labs, Inc., a startup reimagining consumer video for billions of users. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
IMPORTANT NOTE FOR ATTENDEES: If you have already registered for the Fast Code Seminars on Zoom since July 27, 2020, please use the Zoom link that you have received. This link will stay the same for subsequent Fast Code seminars this semester. Zoom does not recognize a second registration, and will not send out the link a second time. If you have any problems with registration, please contact firstname.lastname@example.org and email@example.com by 1:30pm on the day of the seminar, so that we can try to resolve it before the seminar begins.
Algorithms & Theory, AI & Machine Learning
Created by Julian J. Shun at Wednesday, September 23, 2020 at 12:07 PM.