Segmental Recurrent Neural Networks for End-to-End Speech Recognition

Speaker: Liang Lu , Toyota Technological Institute at Chicago

Date: Tuesday, February 21, 2017

Time: 3:00 PM to 4:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-G882 (Hewlett Room - Stata Center)

Event Type:

Room Description:

Host: Jim Glass, MIT CSAIL

Contact: Marcia G. Davidson, 617-253-3049, marcia@csail.mit.edu

Relevant URL:

Speaker URL: None

Speaker Photo:
None

Reminders to: seminars@csail.mit.edu, sls-seminars@csail.mit.edu

Reminder Subject: TALK: Segmental Recurrent Neural Networks for End-to-End Speech Recognition

Recently, there has been an increasing interest in end-to-end speech recognition based on deep learning. Models that have been demonstrated to be promising for this task include neural attention models and connectionist temporal classification (CTC). In this talk, I will present a third type of model for this purpose. This model connects the segmental conditional random field (SCRF) with a recurrent neural network (RNN) encoder used for feature extraction. Compared to most of the previous SCRF-based approaches, this model does not require features or segmental boundaries from an external system. Instead, this model marginalizes over all the possible segmentations. The SCRF component and the RNN feature extractor can be trained jointly from end-to-end. In this talk, I will also compare this model to CTC and attention models, and demonstrate that they may be combined together in the multi-task learning framework.

Liang Lu a Research Assistant Professor at the Toyota Technological Institute at Chicago, a philanthropically endowed academic computer science institute located at the University of Chicago campus. He received his Ph.D. from the University of Edinburgh in 2013, supervised by Prof. Steve Renals. He then worked as a Postdoctoral Research Associate at the same university until 2016. He has a broad research interest in the field of speech and language processing, and he received the best paper award for his work on the low-resource pronunciation modeling at the 2013 IEEE ASRU workshop.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Marcia G. Davidson Email at Wednesday, February 15, 2017 at 6:37 PM.