General context-aware data matching and merging framework

Authors: Slavko, Žitnik; Lovro, Šubelj; Dejan, Lavbič; Olegas, Vasilecas; Marko, Bajec
Year: 2013
Venue: Informatica
Product of the Action: No

Due to numerous public information sources and services, many methods to combine heterogeneous data were proposed recently. However, general end-to-end so- lutions are still rare, especially systems taking into account different context dimensions. Therefore, the techniques often prove insufficient or are limited to a certain domain. In this paper we briefly review and rigorously evaluate a general framework for data matching and merging. The framework employs collective entity resolution and redun- dancy elimination using three dimensions of context types. In order to achieve domain independent results, data is enriched with semantics and trust. However, the main contri- bution of the paper is evaluation on five public domain-incompatible datasets. Further- more, we introduce additional attribute, relationship, semantic and trust metrics, which allow complete framework management. Besides overall results improvement within the framework, metrics could be of independent interest.