Event

 
 

ICSI submission to NIST's Text Analysis Workshop (TAC)

Dan Gillick

ICSI

Tuesday, November 11, 2008
12:30

The ICSI multi-document summarization system relies on a general framework that casts summarization as a global optimization problem with an integer linear programming solution. Our primary submission, a simple sentence extractor with an n-gram frequency heuristic, gives results at least as good as any reported on the first part of the task.

Our secondary submission adds compressed sentence alternatives, achieving high ROUGE scores but lower manual scores. We also observe that an oracle version of our sentence extractor is nearly a direct optimization of ROUGE. We show oracle results for the TAC data set and discuss their significance.

Finally, we provide a detailed analysis of the linguistic quality of our two systems, suggesting specifically where improvements might be most useful.

Glossary:
ROUGE: a scheme for automatically evaluating summaries (like BLEU for machine translation evaluation)

 
Copyright © 2005 International Computer Science Institute. All Rights Reserved.