| |
Speech Lunch Talk
Asli Celikyilmaz
Berkeley Initiative in Soft Computing, UC Berkeley
Tuesday, September 22, 2009
12:30
Automated question answering (QA) – the ability of a machine to digest
a corpus in natural language and answer questions posed by a user in
natural language – is a challenging and active research topic in the
machine learning and computational linguistic community. Aside from
the complexity of learning the rules of a language, one of the
challenges in QA research is that a large amount of data is required
to teach certain concepts to the machine to enable correct reasoning
about natural language expressions. To date, extensive human input is
required to build such training sets or useful and effective expert
systems. However, we need more efficient and robust algorithms that
can help the machine learn concepts with considerably less human
input. This talk will provide a summary of the latest research at
Berkeley Initiative of Soft Computing (BISC) Laboratory on learning
algorithms aimed at improving information extraction from large
amounts of unlabeled data, given considerably less labeled data.
Recent research suggests that graph-based semi-supervised learning is
a promising approach to data-sparse learning problems in natural
language processing, and that the imprecision in learning parameters
and the structure of the similarity functions can help to identity the
uncertainties in building models of a system. Data-sparse learning
relies on graphs that jointly represent each data point, but the
problem of how to best formulate the graph representation remains an
open research topic. Moreover, it is infeasible to extract knowledge
from very large datasets with existing methods. Thus, in the first
part of the talk we will discuss our novel improvements on spectral
learning: (i) the summarization algorithm on graphs, which is used to
construct representative data points along with their local density
constraints for denser regions in the graph, and (ii) a new
graph-representation algorithm for Q/A systems.
We will then present the architecture of a domain-limited question
answering prototype currently under development. The main focus will
be on the multi-module learning structure and overview of the
information extraction part based on textual entailment module.
In the last part of the talk, we will present the recent work on
automatic semantic component extraction such as topic, focus, event,
etc., from queries and our novel graph based approach to segmentation
problem. We will also present the preliminary work on document
retrieval for q/a with less supervision based on Bayesian latent
methods.
|
|