Thesis Defense: Building Blocks for Human-AI Alignment: Specify, Inspect, Model, and Revise

Speaker: Serena Booth

Date: Friday, October 06, 2023

Time: 9:30 AM to 11:00 AM Note: all times are in the Eastern Time Zone

Public: Yes

Location: Room 32-D463 (Star)

Event Type: Thesis Defense

Room Description:

Host:

Contact: Serena Booth, serenabooth@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@lists.csail.mit.edu

Reminder Subject: TALK: Thesis Defense - Serena Booth - Building Blocks for Human-AI Alignment: Specify, Inspect, Model, and Revise

Abstract: The learned behaviors of AI systems and robots should align with the intentions of their human designers. In service of this goal, people---especially experts---must be able to easily specify, inspect, model, and revise AI system and robot behaviors. In this thesis, I study each of these problems. First, I study how experts write reward function specifications for reinforcement learning (RL). I find that these specifications are written with respect to the RL algorithm, not independently, and I find that experts often write erroneous specifications that fail to encode their true intent, even in a trivial setting. Second, I study how to inspect the agent's learned behaviors. To do so, I introduce two related methods to find environments which exhibit particular behaviors. These methods support humans in inspecting the behaviors an agent learns from a given specification. Third, I study cognitive science theories which govern how people build conceptual models to explain these observed examples of agent behaviors. While I find that some foundations of these interventions are employed in typical interventions to support humans in learning about agent behaviors, I also find there is significant room to build better curricula for interaction---for example, by showing counterexamples of alternative behaviors. I conclude by speculating about how these building blocks of human-AI interaction can be combined to enable people to revise their specifications, and, in doing so, create better aligned agents.

Advisor: Julie Shah
Committee: Dylan Hadfield-Menell, Leslie Kaelbling, Elena Glassman, and Peter Stone
Room 32-D463 (Star)
Contact Serena Booth, sbooth@mit.edu, for a Zoom link

Research Areas:
AI & Machine Learning

Impact Areas:

This event is not part of a series.

Created by Serena Booth Email at Monday, October 02, 2023 at 10:16 AM.