From Compression to Acceleration: Efficient Methods for Deep Learning
, Stanford University
Date: Tuesday, December 13, 2016
Time: 10:30 AM to 11:30 AM Note: all times are in the Eastern Time Zone
Location: 36-428 (Haus Room)
Host: Prof. Vivienne Sze, Prof. Tommi Jaakkola
Contact: Teresa Cataldo, email@example.com
Speaker URL: None
TALK: From Compression to Acceleration: Efficient Methods for Deep Learning
Deep neural networks have evolved to be the state-of-the-art technique for machine-learning tasks ranging from computer vision to speech recognition to natural language processing. However, running such neural network is both computationally intensive and memory intensive, making it power hungry to deploy on embedded systems with limited power budget. To address this limitation, this talk presents an algorithm and hardware co-design methodology for improving the efficiency of deep learning.
Starting with changing the algorithm, this talk introduces Deep Compression that can compress the deep neural network models by 10x-49x without loss of prediction accuracy for a broad range of CNN, RNN, and LSTMs. Followed by proposing a new hardware architecture, this talk introduces EIE, the "Efficient Inference Engine" that can do decompression and inference simultaneously which significantly saves memory bandwidth. Taking advantage of the compressed model, and being able to deal with the irregular computation pattern efficiently, EIE achieves 13x speedup and 3000x better energy efficient over GPU.
Finally, this talk closes the loop by revisiting model compression and provides practical guidance on hardware efficiency oriented model compression techniques.
Song Han is a fifth-year Ph.D. student with Prof. Bill Dally at Stanford University. His research focuses on energy-efficient deep learning computing, at the intersection between machine learning and computer architecture. He proposed Deep Compression that can compress state-of-the-art CNNs by 10x-49x while fully preserving prediction accuracy. He designed EIE: Efficient Inference Engine, a hardware accelerator that can make inference directly on the compressed sparse model, which gives significant speedup and energy saving. His work has been covered by TheNextPlatform, TechEmergence, Embedded Vision and OReilly. His work received the Best Paper Award in ICLR16 and Best Poster Award in Stanford Cloud Workshop16.
Created by Teresa Cataldo at Wednesday, December 07, 2016 at 11:43 AM.