Stream Combination Before and/or After the Acoustic Model
Title | Stream Combination Before and/or After the Acoustic Model |
Publication Type | Technical Report |
Year of Publication | 2000 |
Authors | Ellis, D. P. W. |
Other Numbers | 1182 |
Abstract | Combining a number of diverse feature streams has proven to be a very flexible and beneficial technique in speech recognition. In the context of hybrid connectionist-HMM recognition, feature streams can be combined at several points. In this work, we compare two forms of combination: at the input to the acoustic model, by concatenating the feature streams into a single vector (feature combination or FC), and at the output of the acoustic model, by averaging the logs of the estimated posterior probabilities of each subword unit (posterior combination or PC). Based on four feature streams with varying degrees of mutual dependence, we find that the best combination strategy is a combination of feature and posterior combination, with streams that are more independent, as measured by an approximation to conditional mutual information, showing more benefit from posterior combination. |
URL | http://www.icsi.berkeley.edu/ftp/global/pub/techreports/2000/tr-00-007.pdf |
Bibliographic Notes | ICSI Technical Report TR-00-007 |
Abbreviated Authors | D. P.W. Ellis |
ICSI Research Group | Speech |
ICSI Publication Type | Technical Report |