Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing

Speaker: Herman Kamper , Toyota Technological Institute at Chicago (TTIC)

Date: Tuesday, November 15, 2016

Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G882 (Stata Center - Hewlett Room)

Event Type:

Room Description:

Host: Jim Glass, MIT CSAIL

Contact: Marcia G. Davidson, 617-253-3049, marcia@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to: seminars@csail.mit.edu, sls-seminars@csail.mit.edu

Reminder Subject: TALK: Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing

In settings where only unlabelled speech data is available, zero-resource speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. In this talk, I will argue that a combination of top-down and bottom-up modelling is advantageous in tackling these two problems.

To address the problem of frame-level representation learning, I will present the correspondence autoencoder (cAE), a neural network trained with weak top-down supervision from an unsupervised term discovery system. By combining this top-down supervision with unsupervised bottom-up initialization, the cAE yields much more discriminative features than previous approaches. I will then present our new unsupervised segmental Bayesian model that segments and clusters unlabelled speech into hypothesized words. By imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, our system outperforms several others on multi-speaker conversational English and Xitsonga speech data.

Herman is currently a Research Scholar at TTI at Chicago, working with Karen Livescu. He recently submitted his PhD at the University of Edinburgh, where he was supervised by Sharon Goldwater, Aren Jansen and Simon King. Before starting his PhD, he was a research associate and then lecturer at Stellenbosch University, South Africa. His main interest is in low-resource and unsupervised models for speech processing and multi-modal models involving speech.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Marcia G. Davidson Email at Wednesday, November 09, 2016 at 3:29 PM.