Thesis Defense: Efficient and Robust Algorithms for Practical Machine Learning

Speaker: Yujia Bao , MIT CSAIL

Date: Friday, May 06, 2022

Time: 12:00 PM to 1:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-D463 (Stata Center - Star Conference Room)

Event Type: Thesis Defense

Room Description:

Host: Regina Barzilay, MIT CSAIL

Contact: Yujia Bao, yujia@csail.mit.edu

Relevant URL:

Speaker URL: https://people.csail.mit.edu/yujia/

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Thesis Defense: Efficient and Robust Algorithms for Practical Machine Learning

This seminar series is online for everyone.
MIT Community members may attend in person.

For remote access to this event:
https://mit.zoom.us/j/4492242635

Thesis Advisors: Regina Barzilay
Thesis Committee: Dina Katabi, Pulkit Agrawal

Abstract:
Machine learning models are biased when trained on biased datasets. While many approaches have been proposed to mitigate biases, they often require human expert to identify and annotate the biases a priori. This thesis proposes three efficient algorithms for learning robust models. These algorithms do not require explicit annotations of the biases, enabling practical machine learning.

First, we introduce an algorithm that operates on data collected from multiple environments, across which correlations between unstable (bias) features and the label may vary. While these biases are not explicitly annotated, we show that when using a classifier trained on one environment to make predictions on examples from a different environment, its mistakes are informative of the unstable correlations. We leverages these mistakes to create groups of examples whose interpolation yields a distribution with only stable correlations.

We then consider the setting where we lack access to multiple environments, a common scenario for new tasks or resource-limited tasks. We show that in real-world applications related tasks often share similar biases. Based on this observation, we propose an algorithm that infers bias features from a resource-rich source task and transfers this knowledge to the target task.

Finally, we propose an algorithm for automatic bias detection where we are only given a set of input-label pairs. Our algorithm learns to split the dataset so that classifiers trained on the training split cannot generalize to the testing split. The performance gap provides a proxy for measuring the degree of bias in the learned features and can therefore be used to identify unknown biases.

Research Areas:
AI & Machine Learning

Impact Areas:

This event is not part of a series.

Created by Yujia Bao Email at Wednesday, March 16, 2022 at 4:33 PM.