Score Normalization and System Combination for Keyword Spotting in Speech
, Raytheon BBN Technologies
Date: Thursday, November 21, 2013
Time: 4:00 PM to 5:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 3:45 PM
Location: 32-G882 (Stata Center - Hewlett Room)
Host: Jim Glass, MIT CSAIL
Contact: Marcia G. Davidson, 617-253-3049, email@example.com
Speaker URL: None
TALK: Score Normalization and System Combination for Keyword Spotting in Speech
Keyword spotting, the task of finding words or phrases of interest in audio, is related to, but still quite different from that of speech recognition, where a verbatim transcript is desired. Keyword spotting has the objective of extracting specific, content-bearing words or phrases, and this makes it crucial to use a performance measure that does not weight all word tokens equally. One such measure is the Actual Term Weighted Value (ATWV), which has been used in the IARPA-funded Babel project, and this talk is focused on techniques that have its maximization as the optimization objective. Two such techniques are score normalization and system combination. Score normalization aims at converting the scores of different keywords so that they are commensurate with each other, and they more closely correspond to the probability of being correct than raw posteriors. System combination merges the detections of multiple systems together, thus combining the strengths of different detection modalities, tokenizations, or models. Both of these techniques were applied successfully by BBN in the official evaluation of the Babel project in March/April of 2013, resulting in large gains, of the order of 8-10 points (absolute) in five different languages.
(This work was done in collaboration with Richard M. Schwartz; the contribution of BBN colleagues S. Tsakalidis, I. Bulyko, L. Zhang, S. Ranjan, T. Ng, R. Hsiao, G. Saikumar, L. Nguyen, J. Makhoul, as well as other members of the BABELON team, is gratefully acknowledged.)
Damianos Karakos has been a Research Scientist with Raytheon BBN Technologies since June 2012. He obtained the PhD in Electrical Engineering from the University of Maryland in 2002. He was a postdoctoral fellow in the Department of Electrical Engineering and the Center for Language and Speech Processing at Johns Hopkins University between 2003 and 2007. He became Assistant Research Professor in 2007, and, additionally, Research Scientist with the Human Language Technology Center of Excellence at JHU in 2011. His research interests lie in the general area of statistical pattern recognition, with a focus on speech and language applications.
Created by Marcia G. Davidson at Friday, November 15, 2013 at 4:36 PM.