| |
ICSI submission to NIST's Text Analysis Workshop (TAC)
Dan Gillick
ICSI
Tuesday, November 11, 2008
12:30
The ICSI multi-document summarization system relies on a general framework that casts summarization as a global optimization problem with an integer linear programming solution. Our primary submission, a simple sentence extractor with an n-gram frequency
heuristic, gives results at least as good as any reported on the first part of the task.
Our secondary submission adds compressed sentence alternatives, achieving high ROUGE scores but lower manual scores. We also observe that an oracle version of our sentence extractor is nearly a direct optimization of ROUGE. We show oracle results for the TAC data set and discuss their significance.
Finally, we provide a detailed analysis of the linguistic quality of our two systems, suggesting specifically where improvements might be
most useful.
Glossary:
ROUGE: a scheme for automatically evaluating summaries (like BLEU for machine translation evaluation)
|
|