Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR
Title | Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Ravuri, S. |
Other Numbers | 3733 |
Abstract | Tandem systems based on multi-layer perceptrons (MLPs) have improved the performance of automatic speech recognition systems on both large vocabulary and noisy tasks. One potential problem of the standard Tandem approach, however, is that the MLPs generally used do not model temporal dynamics inherent in speech. In this work, we propose a hybrid MLP/Structured-SVM model, in which the parameters between the hidden layer and output layer and temporal transitions between output layers are modeled by a Structured-SVM. A Structured-SVM can be thought of as an extension to the classical binary support vector machine which can naturally classify structures such as sequences. Using this approach, we can identify sequences of phones in an utterance. We try this model on two different corpora Aurora2 and the large-vocabulary section of the ICSI meeting corpus to investigate the models performance in noisy conditions and on a large-vocabulary task. Compared to a difficult Tandem baseline in which the MLP is trained using 2nd-order optimization methods, the MLP/Structured-SVM system decreases WER in noisy conditions by 7.9% relative. On the large vocabulary corpus, the proposed system decreasesWER by 1.1% absolute compared to the 2nd-order Tandem system. |
Acknowledgment | This research was funded by a fellowship from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation. |
URL | https://www.icsi.berkeley.edu/pubs/speech/hybridmlp14.pdf |
Bibliographic Notes | Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore |
Abbreviated Authors | S. Ravuri |
ICSI Research Group | Speech |
ICSI Publication Type | Article in conference proceedings |