- EI Seminar - Yuan Gong - Au...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in October 2023
EI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to Understanding
Speaker:
Yuan Gong
, MIT CSAIL
Date: Thursday, October 19, 2023
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Public: Yes
Location: 32-G575
Event Type: Seminar
Room Description: 32-G575
Host:
Contact: Hyung Ju Suh, 6269930697, hjsuh94@csail.mit.edu
Relevant URL:
Speaker URL: https://yuangongnd.github.io/
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu
Reminder Subject:
TALK: EI Seminar - Yuan Gong - Audio Large Language Models: From Sound Perception to Understanding
Our cognitive abilities enable us not only to perceive and identify sounds but also to comprehend their implicit meaning. While significant advancements have been achieved in general audio event recognition in recent years, models trained with discrete sound label sets possess limited reasoning and understanding capabilities, e.g., the model may recognize the clock chime 6 times, but not know that it indicates a time of 6 o'clock. Can we build an AI model that has both audio perception and reasoning ability?
In this talk, I will share our recent progress in audio large language model (LLM) development. Specifically, I will first introduce a novel GPT-assisted method to generate our large-scale open-ended audio question-answering dataset OpenAQA. I will then discuss the key design choices and the model architecture of our audio large language model. Finally, I will also discuss how to connect an automatic speech recognition model with an audio large language model for joint audio and speech understanding.
Research Areas:
AI & Machine Learning, Graphics & Vision, Human-Computer Interaction, Robotics
Impact Areas:
Created by Hyung Ju Suh at Wednesday, October 11, 2023 at 5:34 PM.