Spoken Term Detection - A Loss for Words

Speaker: Michael Picheny , IBM TJ Watson Research Center

Date: Wednesday, March 05, 2014

Host: Jim Glass and Victor Zue, MIT CSAIL

This talk, originally planned for February 5 and cancelled due to winter storm conditions, has been rescheduled for March 5.

As speech recognition continues to improve, new applications of the technology have been enabled. It is now common to search for information and send accurate short messages by speaking into a cellphone - something completely impractical just a few years ago. Another application that has recently been gaining attention is "Spoken Term Detection" - using speech recognition technology to locate key words or phrases of interest in running speech of variable quality. Spoken Term Detection can be used to issue real time alerts, rapidly identify multimedia clips of interesting content, and, when combined with search technology, even provide real-time commentary during broadcasts and meetings. This talk will describe the basics of Spoken Term Detection systems, including recent advances in core speech recognition technology, performance metrics, how out-of-vocabulary queries are handled, and ways of using score normalization and system combination to dramatically improve system performance.

Michael Picheny is the Senior Manager of the Speech and Language Algorithms Group at the IBM TJ Watson Research Center. Michael has worked in the Speech Recognition area since 1981, joining IBM after finishing his doctorate at MIT. He has been heavily involved in the development of almost all of IBM's recognition systems, ranging from the world's first real-time large vocabulary discrete system through IBM's product lines for telephony and embedded systems. He has published numerous papers in both journals and conferences on almost all aspects of speech recognition. He has received several awards from IBM for his work, including a corporate award, three outstanding Technical Achievement Awards and two Research Division Awards. He is the co-holder of over 30 patents and was named a Master Inventor by IBM in 1995 and again in 2000. Michael served as an Associate Editor of the IEEE Transactions on Acoustics, Speech, and Signal Processing from 1986-1989, was the chairman of the Speech Technical Committee of the IEEE Signal Processing Society from 2002-2004, and is a Fellow of the IEEE. He served as an Adjunct Professor in the Electrical Engineering Department of Columbia University in 2009 and co-taught a course in speech recognition. He recently completed an eight-year term of service on the board of ISCA (International Speech Communication Association). Most recently he was the co-general chair of the IEEE ASRU 2011 Workshop in Hawaii.


This CSAIL SEMINAR SERIES, organized in cooperation with the Siri team at Apple, invites leading researchers in HLT to give lectures that introduce the fundamentals of spoken language systems, assess the current state of the art, outline challenges, and speculate on how they can be met. Lectures occur 2-3 times per semester and should be accessible to undergraduates with some technical background.

