Modeling Dynamics in Connectionist Speech Recognition - The Time Index Model

TitleModeling Dynamics in Connectionist Speech Recognition - The Time Index Model
Publication TypeTechnical Report
Year of Publication1994
AuthorsKonig, Y., & Morgan N.
Other Numbers882

Here, we introduce an alternative to the Hidden Markov Model (HMM) as the underlying representation of speech production. HMMs suffer from well known limitations, such as the unrealistic assumption that the observations generated in a given state are independent and identically distributed (i.i.d). We propose a time index model that explicitly conditions the emission probability of a state on the time index, i.e., on the number of "visits" in the current state of the Markov chain in a sequence. Thus, the proposed model does not require an i.i.d. assumption. The connectionist framework enables us to represent the dependence on the time index as a non-parametric distribution and to share parameters between different speech unit models. Furthermore, we discuss an extension to the basic time index model by incorporating information about the duration of the phone segments. Our initial results show that given the position of the boundaries between basic speech units, e.g., phones, we can improve our current connectionist system performance significantly by using this model. However, we still do not know whether these boundaries can be estimated reliably, nor do we know how much benefit we can obtain from this method given less accurate boundary information. Currently we are experimenting with two possible approaches: trying to learn smooth probability densities for the boundaries, and getting a set of reasonable segmentations from an N-Best search. In both cases we will need to consider the effect of incorrect boundaries, since they will undoubtedly occur.

Bibliographic Notes

ICSI Technical Report TR-94-012

Abbreviated Authors

Y. Konig and N. Morgan

ICSI Publication Type

Technical Report