From Compression to Acceleration: Efficient Methods for Deep Learning

Speaker: Song Han , Stanford University

Date: Tuesday, December 13, 2016

Time: 10:30 AM to 11:30 AM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 36-428 (Haus Room)

Event Type:

Room Description:

Host: Prof. Vivienne Sze, Prof. Tommi Jaakkola

Contact: Teresa Cataldo,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:

Reminder Subject: TALK: From Compression to Acceleration: Efficient Methods for Deep Learning

Deep neural networks have evolved to be the state-of-the-art technique for machine-learning tasks ranging from computer vision to speech recognition to natural language processing. However, running such neural network is both computationally intensive and memory intensive, making it power hungry to deploy on embedded systems with limited power budget. To address this limitation, this talk presents an algorithm and hardware co-design methodology for improving the efficiency of deep learning.

Starting with changing the algorithm, this talk introduces “Deep Compression” that can compress the deep neural network models by 10x-49x without loss of prediction accuracy for a broad range of CNN, RNN, and LSTMs. Followed by proposing a new hardware architecture, this talk introduces EIE, the "Efficient Inference Engine" that can do decompression and inference simultaneously which significantly saves memory bandwidth. Taking advantage of the compressed model, and being able to deal with the irregular computation pattern efficiently, EIE achieves 13x speedup and 3000x better energy efficient over GPU.
Finally, this talk closes the loop by revisiting model compression and provides practical guidance on hardware efficiency oriented model compression techniques.

Song Han is a fifth-year Ph.D. student with Prof. Bill Dally at Stanford University. His research focuses on energy-efficient deep learning computing, at the intersection between machine learning and computer architecture. He proposed Deep Compression that can compress state-of-the-art CNNs by 10x-49x while fully preserving prediction accuracy. He designed EIE: Efficient Inference Engine, a hardware accelerator that can make inference directly on the compressed sparse model, which gives significant speedup and energy saving. His work has been covered by TheNextPlatform, TechEmergence, Embedded Vision and O’Reilly. His work received the Best Paper Award in ICLR’16 and Best Poster Award in Stanford Cloud Workshop’16.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Teresa Cataldo Email at Wednesday, December 07, 2016 at 11:43 AM.