Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR

TitleDynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR
Publication TypeConference Paper
Year of Publication2016
AuthorsGergen, S., Zeiler S., Abdelaziz A. Hussen, Nickel R., & Kolossa D.
Published inProceedings of Interspeech 2016

Automatic speech recognition (ASR) enables very intuitive human-machine interaction. However, signal degradations due to reverberation or noise reduce the accuracy of audio-based recognition. The introduction of a second signal stream that is not affected by degradations in the audio domain (e.g., a video stream) increases the robustness of ASR against degradations in the original domain. Here, depending on the signal quality of audio and video at each point in time, a dynamic weighting of both streams can optimize the recognition performance. In this work, we introduce a strategy for estimating optimal weights for the audio and video streams in turbo-decoding-based ASR using a discriminative cost function. The results show that turbo decoding with this maximally discriminative dynamic weighting of information yields higher recognition accuracy than turbo-decoding-based recognition with fixed stream weights or optimally dynamically weighted audiovisual decoding using coupled hidden Markov models.

ICSI Research Group