Taxonomic Data Integration from Multilingual Wikipedia Editions

TitleTaxonomic Data Integration from Multilingual Wikipedia Editions
Publication TypeMiscellaneous
Year of Publication2013
Authorsde Melo, G., & Weikum G.
Page(s)1-39
Other Numbers3389
Abstract

Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available.

Acknowledgment

This work was partially funded by the Deutscher Akademischer Austausch Dienst (DAAD) through a postdoctoral fellowship.

Bibliographic Notes

Knowledge and Information Systems, pp. 1-39

Abbreviated Authors

G. de Melo and G. Weikum

ICSI Research Group

AI

ICSI Publication Type

None