Deep and Wide: Multiple Layers in Automatic Speech Recognition

TitleDeep and Wide: Multiple Layers in Automatic Speech Recognition
Publication TypeJournal Article
Year of Publication2012
AuthorsMorgan, N.
Published inIEEE Transactions on Audio
Volume20
Issue1
Page(s)7-13
Other Numbers3227
Abstract

This paper reviews a line of research carried out overthe last decade in speech recognition assisted by discriminativelytrained, feedforward networks. The particular focus is on the use ofmultiple layers of processing preceding the hidden Markov modelbased decoding of word sequences. Emphasis is placed on the use ofmultiple streams of highly dimensioned layers, which have provenuseful for this purpose. This paper ultimately concludes that whilethe deep processing structures can provide improvements for thisgenre, choice of features and the structure with which they are incorporated,including layer width, can also be significant factors.Index Terms¬óMachine learning, multilayer perceptrons, speechrecognition

Acknowledgment

The author would like to thank several colleagues for majorcontributions of the ideas that led to this paper: H. Bourlardof IDIAP and EPFL, H. Hermansky of Johns Hopkins, andA. Stolcke of SRI. The author would also like to thank anonymousreviewers, (as well as O. Vinyals of ICSI and UCB, whoprovided internal criticism), who made important suggestionsto improve the draft. Despite these contributions, the viewsexpressed in this paper, and in particular the errors, can beblamed on the author alone.

URLhttp://www.icsi.berkeley.edu/pubs/speech/deepandwide12.pdf
Bibliographic Notes

IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 1, pp. 7-13

Abbreviated Authors

N. Morgan

ICSI Research Group

Speech

ICSI Publication Type

Article in journal or magazine