Enabling community-driven information integration through clustering. Distributed and Parallel Databases

Authors: Khalid, Belhajjame; Norman W., Paton; Cornelia, Hedeler; Alvaro A. A., Fernandes;
Year: 2015
Venue: Distributed and Parallel Database Journal
Link: http://tinyurl.com/ofo4236
Product of the Action: No

It has become widely recognized that user feedback can play a fundamental role in facilitating information integration tasks, e.g., the construction of integration schema and the specification of schema mappings. While promising, existing proposals make the assumption that the users providing feedback expect the same results from the integration system. In practice, however, different users may anticipate different results, due, e.g., to their preferences or application of interest, in which case the feedback they provide may be conflicting, thereby deteriorating the quality of the services provided by the integration system. In this paper, we present clustering strategies for grouping information integration users into groups of users with similar expectations as to the results delivered by the integration system. As well as grouping information integration users, we show that clustering results can be used as inputs to a wide range of functionalities that are relevant in the context of crowd-driven information integration. Specifically, we show that clustering can be used to identify feedback of relevance to a given user by exploiting the feedback provided by other users in the same cluster. We report on evaluation exercises that assess the effectiveness of the clustering strategies we propose, and showcase the benefit