| |
Generative Models for Unsupervised Language Learning
Tom Griffiths
UCB Psychology Department
Tuesday, February 10, 2009
12:30
Learning a language requires making inferences about the components of the language at many different levels, from sorting sounds into
phonemes to recognizing which words are semantically related. Human
learners typically make these inferences without direct instruction,
performing a kind of unsupervised learning. In statistics,
unsupervised learning is often treated as a problem of density
estimation: a class of generative models is specified, and learning
consists of estimating the parameters of that model. From this
perspective, understanding human learning reduces to a question of how
to specify appropriate generative models for natural language. I will
talk about recent work exploring two kinds of generative models for
unsupervised language learning: nonparametric Bayesian models, which
provide a way to capture the statistics of natural languages in a way
that is beneficial for identifying latent structure, and topic models,
which pick out the long-range correlations between the occurrence of
words that are relevant to identifying semantic relatedness.
|
|