Reinforcement Learning from Human Feedback: Algorithms & Applications

Speaker: Eric Mitchell , Stanford

Date: Friday, December 01, 2023

Time: 11:00 AM to 12:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: Seminar Room G449 (Patil/Kiva)

Event Type:

Room Description:

Host: Dylan Hadfield-Menell, MIT

Contact: Dylan Hadfield-Menell,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:

Reminder Subject: TALK: Reinforcement Learning from Human Feedback: Algorithms & Applications

A phase shift in public interest, private investment, and research priorities in AI occurred in response to the release of ChatGPT in November 2022. In this talk, I will provide an overview of the “classical” approach to the key method behind ChatGPT’s capabilities, reinforcement learning from human feedback (RLHF), including the formal problem statement and typical implementation. I will then describe our recently proposed alternative algorithm, Direct Preference Optimization (DPO), which reduces the computational cost, implementation complexity, and potential instabilities of the classic RLHF pipeline (catalyzing strong open source chat models such as HuggingFace Zephyr and AI2’s Tülu). Finally, I will discuss applications of DPO to current challenges in NLP, such as improving factuality and reducing “hallucinations” of large language models.

Eric Mitchell is a final-year PhD student at Stanford University, advised by Chelsea Finn and Christopher D Manning. His research develops methods to improve the reliability of large language models, in particular through enabling them to better align with human values, respond to changes in the state of the world, and reason soundly.

Research Areas:
AI & Machine Learning

Impact Areas:

This event is not part of a series.

Created by Dylan Hadfield-Menell Email at Monday, November 27, 2023 at 3:37 PM.