Speaker: Serena Booth

Date: Friday, October 06, 2023

Time: 9:30 AM to 11:00 AM Note: all times are in the Eastern Time Zone

Abstract: The learned behaviors of AI systems and robots should align with the intentions of their human designers. In service of this goal, people---especially experts---must be able to easily specify, inspect, model, and revise AI system and robot behaviors. In this thesis, I study each of these problems. First, I study how experts write reward function specifications for reinforcement learning (RL). I find that these specifications are written with respect to the RL algorithm, not independently, and I find that experts often write erroneous specifications that fail to encode their true intent, even in a trivial setting. Second, I study how to inspect the agent's learned behaviors. To do so, I introduce two related methods to find environments which exhibit particular behaviors. These methods support humans in inspecting the behaviors an agent learns from a given specification. Third, I study cognitive science theories which govern how people build conceptual models to explain these observed examples of agent behaviors. While I find that some foundations of these interventions are employed in typical interventions to support humans in learning about agent behaviors, I also find there is significant room to build better curricula for interaction---for example, by showing counterexamples of alternative behaviors. I conclude by speculating about how these building blocks of human-AI interaction can be combined to enable people to revise their specifications, and, in doing so, create better aligned agents.

Advisor: Julie Shah
Committee: Dylan Hadfield-Menell, Leslie Kaelbling, Elena Glassman, and Peter Stone
