Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR

TitleHybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR
Publication TypeConference Paper
Year of Publication2014
AuthorsRavuri, S.
Other Numbers3733

Tandem systems based on multi-layer perceptrons (MLPs) have improved the performance of automatic speech recognition systems on both large vocabulary and noisy tasks. One potential problem of the standard Tandem approach, however, is that the MLPs generally used do not model temporal dynamics inherent in speech. In this work, we propose a hybrid MLP/Structured-SVM model, in which the parameters between the hidden layer and output layer and temporal transitions between output layers are modeled by a Structured-SVM. A Structured-SVM can be thought of as an extension to the classical binary support vector machine which can naturally classify “structures” such as sequences. Using this approach, we can identify sequences of phones in an utterance.

We try this model on two different corpora – Aurora2 and the large-vocabulary section of the ICSI meeting corpus – to investigate the model’s performance in noisy conditions and on a large-vocabulary task. Compared to a difficult Tandem baseline in which the MLP is trained using 2nd-order optimization methods, the MLP/Structured-SVM system decreases WER in noisy conditions by 7.9% relative. On the large vocabulary corpus, the proposed system decreasesWER by 1.1% absolute compared to the 2nd-order Tandem system.


This research was funded by a fellowship from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.

Bibliographic Notes

Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore

Abbreviated Authors

S. Ravuri

ICSI Research Group


ICSI Publication Type

Article in conference proceedings