How does one bit-flip corrupt an entire deep neural network, and what to do about it

Speaker: Yanjing Li , University of Chicago

Date: Monday, April 01, 2024

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G575

Event Type: Seminar

Room Description: 32-G575

Host: Mengjia Yan, CSAIL MIT

Contact:

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: How does one bit-flip corrupt an entire deep neural network, and what to do about it

Abstract:
Deep neural networks are increasingly susceptible to hardware failures. The impact of hardware failures on these workloads is severe – even a single bit-flip can corrupt an entire network during both training and inference. The urgency of tackling this challenge, known as the Silent Data Corruption challenge in a broader context, has been widely raised by both the industry and academia.

In this talk, I will first present the first in-depth resilience study targeting DNN workloads and hardware failures that occur in the logic portion of deep learning accelerator systems, including a comprehensive characterization of hardware failure effects, along with the fundamental understanding of how hardware failures propagate in hardware devices and interact with the workloads. Next, based on the insights obtained from our study, I will present ultra lightweight yet highly effective techniques to mitigate hardware failures in deep learning accelerator systems.

Bio:

Yanjing Li is an Assistant Professor in the Department of Computer Science at the University of Chicago. Prior to joining the university, she was a senior research scientist at Intel Labs. Professor Li received her Ph.D. in Electrical Engineering from Stanford University, an M.S. in Mathematical Sciences (with honors) and a B.S. in Electrical and Computer Engineering (with a double major in Computer Science) from Carnegie Mellon University.

Professor Li has received various awards, including the NSF CAREER Award, DAC under-40 innovators award, Google research scholar award, NSF/SRC energy-efficient computing: from devices to architectures (E2CDA) program award, Intel Labs Gordy academy award (highest honor in Intel Labs) and several other Intel recognition awards, outstanding dissertation award (European Design and Automation Association), and multiple best paper awards (ACM Great Lakes Symposium on VLSI, IEEE VLSI Test Symposium, and IEEE International Test Conference).

Research Areas:
Computer Architecture

Impact Areas:

This event is not part of a series.

Created by Nathan Higgins Email at Tuesday, March 12, 2024 at 10:48 AM.