Learning a Hierarchy of Speech Units with Self-Supervision

Speaker: David Harwath , University of Texas, Austin

Date: Thursday, November 16, 2023

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G882 (Stata Center - Hewlett Room)

Event Type:

Room Description:

Host: James Glass, MIT CSAIL

Contact: Marcia G. Davidson, 617-253-3049, marcia@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu, sls@csail.mit.edu

Reminder Subject: TALK: Learning a Hierarchy of Speech Units with Self-Supervision

Abstract:
Over the past several years, self-supervised learning has become an incredibly popular topic in the speech community. One appealing aspect of self-supervised speech models is that they can be used to discover discrete tokenizations of the speech signal, which roughly correspond to units such as phonemes, which can then be used in lieu of symbolic transcriptions in a multitude of downstream tasks. Despite these successes, it is still not entirely clear what these self-supervised units capture or what methods can be used to control their emergence. In this talk, I will describe our lab's ongoing work to develop new models and methods for learning these units at multiple granularities, from phones to syllables and onto words. I will demonstrate how the choice of self-supervised learning objective influences what kind of units emerge from these models, show how different types of units can emerge in different locations within a model, and offer some thoughts on how we might be able to learn a full hierarchy of speech units with a unified model.

Bio:
David Harwath is an assistant professor in the computer science department at UT Austin. His research focuses on multimodal, self-supervised learning algorithms for speech, audio, vision, and text. He has received the NSF CAREER award (2023), an ASRU best paper nomination (2015), and was awarded the 2018 George M. Sprowls Award for best computer science PhD thesis at MIT. He holds a B.S. in electrical engineering from UIUC (2010), a S.M. in computer science from MIT (2013), and a Ph.D. in computer science from MIT (2018).

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Marcia G. Davidson Email at Wednesday, November 15, 2023 at 9:45 AM.