- Unsupervised Neural and Bay...
- Edit Event
- Cancel Event
- Preview Reminder
- Send Reminder
- Other events happening in November 2016
Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing
Speaker:
Herman Kamper
, Toyota Technological Institute at Chicago (TTIC)
Date: Tuesday, November 15, 2016
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Public: Yes
Location: 32-G882 (Stata Center - Hewlett Room)
Event Type:
Room Description:
Host: Jim Glass, MIT CSAIL
Contact: Marcia G. Davidson, 617-253-3049, marcia@csail.mit.edu
Speaker URL: None
Speaker Photo:
None
Reminders to:
seminars@csail.mit.edu, sls-seminars@csail.mit.edu
Reminder Subject:
TALK: Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing
In settings where only unlabelled speech data is available, zero-resource speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. In this talk, I will argue that a combination of top-down and bottom-up modelling is advantageous in tackling these two problems.
To address the problem of frame-level representation learning, I will present the correspondence autoencoder (cAE), a neural network trained with weak top-down supervision from an unsupervised term discovery system. By combining this top-down supervision with unsupervised bottom-up initialization, the cAE yields much more discriminative features than previous approaches. I will then present our new unsupervised segmental Bayesian model that segments and clusters unlabelled speech into hypothesized words. By imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, our system outperforms several others on multi-speaker conversational English and Xitsonga speech data.
Herman is currently a Research Scholar at TTI at Chicago, working with Karen Livescu. He recently submitted his PhD at the University of Edinburgh, where he was supervised by Sharon Goldwater, Aren Jansen and Simon King. Before starting his PhD, he was a research associate and then lecturer at Stellenbosch University, South Africa. His main interest is in low-resource and unsupervised models for speech processing and multi-modal models involving speech.
Research Areas:
Impact Areas:
Created by Marcia G. Davidson at Wednesday, November 09, 2016 at 3:29 PM.