Why Does Maximum Mutual Information Estimation Work So Well?
Steven Wegmann and Larry Gillick
Nuance Communications
Tuesday, January 20, 2009
12:30
Why does maximum mutual information estimation (MMI) consistently
outperform maximum likelihood estimation (MLE) on speech recognition
tasks using hidden Markov models? A standard statistical argument
shows that if our model assumptions were correct, then MMI would not
outperform MLE. The natural question to ask is what erroneous model
assumptions is MMI compensating for? In this talk we attempt to
answer this question using two methods. In the first we simulate
training and test data that depart from our models in controlled ways
and examine recognition results before and after MMI. In the second
we assess how the data differ from our models by studying expected and
observed properties of the scores that they emit.
|