Making Automatic Speech Recognition More Robust to Fast Speech
Title | Making Automatic Speech Recognition More Robust to Fast Speech |
Publication Type | Technical Report |
Year of Publication | 1995 |
Authors | Mirghafori, N., Fosler-Lussier E., & Morgan N. |
Other Numbers | 1007 |
Keywords | Automatic Speech Recognition, Duration Modeling, Robustness, Speaking Rate |
Abstract | Psychoacoustic studies show that human listeners are sensitive to speaking rate variations cite. Automatic speech recognition (ASR) systems are even more affected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems cite. In this work, we have studied the causes of higher error and concluded that both the acoustic-phonetic and the phonological differences are sources of higher word error rates. We have also studied various measures for quantifying rate of speech (ROS), and used simple methods for estimating the speaking rate of a novel utterance using ASR technology. We have implemented mechanisms that make our ASR system more robust to fast speech. Using our ROS estimator to identify fast sentences in the test set, our rate-dependent system has 24.5% fewer errors on the fastest sentences and 6.2% fewer errors on all sentences of the WSJ93 evaluation set relative to the baseline HMM/MLP system. These results were achieved using some gross approximations: adjustment for one rate over an entire utterance, hand-tweaked rather than optimal transition parameters, and quantization of rate effects to two levels (fast and not fast). |
URL | http://www.icsi.berkeley.edu/ftp/global/pub/techreports/1995/tr-95-067.pdf |
Bibliographic Notes | ICSI Technical Report TR-95-067 |
Abbreviated Authors | N. Mirghafori, E. Fosler, and N. Morgan |
ICSI Publication Type | Technical Report |