| |
A Study of the Extended Baum-Welch Algorithm and Conditional Maximum Likelihood Estimation Using Hidden Markov Models
Steve Wegmann
Nuance
Friday, August 22, 2008
2:00 pm
Maximum mutual information (MMI), a special case of conditional
maximum likelihood, is an estimation technique which has been
successfully applied to speech recognition problems using hidden
Markov models for many years: at first on small tasks using
multinomial models, later to small tasks with multivariate normal
models, through to the present on very large tasks using mixture,
multivariate normal models. Although MMI has been extremely
successful at reducing the word error rate on difficult speech
recognition tasks, the current MMI machinery, the Extended Baum-Welch
algorithm, is not as well understood as maximum likelihood estimation
using the Baum-Welch algorithm. Previous work in the relevant
literature has concentrated on showing that the conditional likelihood
increases with each iteration of Extended Baum-Welch, but it has
largely ignored questions such as: what happens to the model
parameters during repeated iterations of Extended Baum-Welch? In this
talk, we examine this and related questions by running hundreds of
iterations of Extended Baum-Welch on a standard Wall Street Journal
task. We also report on preliminary work which attempts to identify
why Extended Baum-Welch is so successful at reducing the error rate.
---- bio ----
Steven Wegmann is a Principal Research Scientist at Nuance
Communications where he studies the mathematical and statistical
properties of algorithms used in speech recognition and heads a small
group of acoustic modelers.
Steven came to Nuance via its acquisition of VoiceSignal Technologies.
At VoiceSignal he was the lead acoustic modeler in a small group that
developed models for speaker independent name and digit recognition
products in more than twenty languages that shipped in hundreds of
millions of cell phones. This group also developed models for
discrete and continuous dictation, mobile search, and hmm-based
synthesis products.
Before joining VoiceSignal and before a brief post-acquisition stint
at Lernout and Hauspie, Steven was a member of the research department
at Dragon Systems where he was a key contributor on a variety of
government sponsored research projects. Earlier in his career, he was
a mathematician who specialized in algebraic topology. He obtained
his Ph.D. in mathematics at the University of Warwick while he was a
Marshall Scholar.
|
|