Taxonomic Data Integration from Multilingual Wikipedia Editions
Title | Taxonomic Data Integration from Multilingual Wikipedia Editions |
Publication Type | Miscellaneous |
Year of Publication | 2013 |
Authors | de Melo, G., & Weikum G. |
Page(s) | 1-39 |
Other Numbers | 3389 |
Abstract | Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available. |
Acknowledgment | This work was partially funded by the Deutscher Akademischer Austausch Dienst (DAAD) through a postdoctoral fellowship. |
Bibliographic Notes | Knowledge and Information Systems, pp. 1-39 |
Abbreviated Authors | G. de Melo and G. Weikum |
ICSI Research Group | AI |
ICSI Publication Type | None |