EI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to Understanding

Speaker: Yuan Gong , MIT CSAIL

Date: Thursday, October 19, 2023

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G575

Event Type: Seminar

Room Description: 32-G575

Host:

Contact: Hyung Ju Suh, 6269930697, hjsuh94@csail.mit.edu

Relevant URL:

Speaker URL: https://yuangongnd.github.io/

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu

Reminder Subject: TALK: EI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to Understanding

Our cognitive abilities enable us not only to perceive and identify sounds but also to comprehend their implicit meaning. While significant advancements have been achieved in general audio event recognition in recent years, models trained with discrete sound label sets possess limited reasoning and understanding capabilities, e.g., the model may recognize the clock chime 6 times, but not know that it indicates a time of 6 o'clock. Can we build an AI model that has both audio perception and reasoning ability?

In this talk, I will share our recent progress in audio large language model (LLM) development. Specifically, I will first introduce a novel GPT-assisted method to generate our large-scale open-ended audio question-answering dataset OpenAQA. I will then discuss the key design choices and the model architecture of our audio large language model. Finally, I will also discuss how to connect an automatic speech recognition model with an audio large language model for joint audio and speech understanding.

Research Areas:
AI & Machine Learning, Graphics & Vision, Human-Computer Interaction, Robotics

Impact Areas:

This event is not part of a series.

Created by Hyung Ju Suh Email at Wednesday, October 11, 2023 at 5:34 PM.