Event

 
 

Speech Lunch Talk

Asli Celikyilmaz

Berkeley Initiative in Soft Computing, UC Berkeley

Tuesday, September 22, 2009
12:30

Automated question answering (QA) – the ability of a machine to digest a corpus in natural language and answer questions posed by a user in natural language – is a challenging and active research topic in the machine learning and computational linguistic community. Aside from the complexity of learning the rules of a language, one of the challenges in QA research is that a large amount of data is required to teach certain concepts to the machine to enable correct reasoning about natural language expressions. To date, extensive human input is required to build such training sets or useful and effective expert systems. However, we need more efficient and robust algorithms that can help the machine learn concepts with considerably less human input. This talk will provide a summary of the latest research at Berkeley Initiative of Soft Computing (BISC) Laboratory on learning algorithms aimed at improving information extraction from large amounts of unlabeled data, given considerably less labeled data.

Recent research suggests that graph-based semi-supervised learning is a promising approach to data-sparse learning problems in natural language processing, and that the imprecision in learning parameters and the structure of the similarity functions can help to identity the uncertainties in building models of a system. Data-sparse learning relies on graphs that jointly represent each data point, but the problem of how to best formulate the graph representation remains an open research topic. Moreover, it is infeasible to extract knowledge from very large datasets with existing methods. Thus, in the first part of the talk we will discuss our novel improvements on spectral learning: (i) the summarization algorithm on graphs, which is used to construct representative data points along with their local density constraints for denser regions in the graph, and (ii) a new graph-representation algorithm for Q/A systems.

We will then present the architecture of a domain-limited question answering prototype currently under development. The main focus will be on the multi-module learning structure and overview of the information extraction part based on textual entailment module.

In the last part of the talk, we will present the recent work on automatic semantic component extraction such as topic, focus, event, etc., from queries and our novel graph based approach to segmentation problem. We will also present the preliminary work on document retrieval for q/a with less supervision based on Bayesian latent methods.

 
Copyright © 2005 International Computer Science Institute. All Rights Reserved.