LINDA: Distributed Web-of-Data-Scale Entity Matching

Publication TypeConference Paper
Year of Publication2012
AuthorsBöhm, C., de Melo G., Naumann F., & Weikum G.
Other Numbers3393

Linked Data has emerged as a powerful way of interconnectingstructured data on the Web. However, the cross-linkagebetween Linked Data sources is not as extensive asone would hope for. In this paper, we formalize the task ofautomatically creating "sameAs" links across data sources ina globally consistent manner. Our algorithm, presented ina multi-core as well as a distributed version, achieves thislink generation by accounting for joint evidence of a match.Experiments conrm that our system scales beyond 100 millionentities and delivers highly accurate results despite thevast heterogeneity and daunting scale.


This work was partially funded by the Deutscher Akademischer Austausch Dienst (DAAD) through a postdoctoral fellowship.

Bibliographic Notes

Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), pp. 2104-2108, Maui, Hawaii

Abbreviated Authors

C. Boehm, G. de Melo, F. Naumann, and G. Weikum

ICSI Research Group


ICSI Publication Type

Article in conference proceedings