Aleks Petrov: When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations

Speaker: Aleks Petrov

Date: Friday, December 08, 2023

Time: 11:00 AM to 12:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: G449 (KIVA)

Event Type:

Room Description:

Host: Aleksander Madry

Contact: Deborah Goodwin, dlehto@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
Aleks petrov %282%29

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: Aleks Petrov: When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations

Abstract:
Large Language Models (LLMs), typically based on transformer architecture, have undergone extensive research aimed at enhancing their scalability and performance. Despite the technological advancements, the resource-intensive nature of training state-of-the-art transformer models limits their accessibility to a broader spectrum of researchers and smaller enterprises. As a result, context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. However, despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. Our research presents the first theoretical framework for context-based fine-tuning methods. We establish that these methods cannot alter the attention patterns over the content; they can only skew the outputs of an attention layer in a predetermined direction. This inherent limitation restricts the scope of what can be achieved through context-based fine-tuning. While these methods can effectively prompt a model to manifest a skill it has already acquired, combining pre-existing skills presents a more complex challenge and is harder to optimise. Crucially, our analysis demonstrates that context-based fine-tuning methods, including prompting, are incapable of learning entirely new behaviours by the model.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Deborah Goodwin Email at Thursday, November 30, 2023 at 1:26 PM.