2nd International KEYSTONE Conference

IKC 2016

Cluj-Napoca Romania, 8-9 September 2016


Time Sept 8
9.00-10.30 Session I
10.30-10:50 Coffee Break
10.50-12.40 Session II
- Lunch
14.00-15.30 Session III
15:30-16.00 Coffee Break
16.00-17.30 Session IV
Time Sept 9
9.00-10.30 Session V
10.30-11:00 Coffee Break
11.00-12.30 Session VI
- Lunch
14.00-15.30 Session VII
15:30-16.00 Coffee Break
16.00-17.30 Session VIII

Session I

9.00 - 9.30 Welcome / Presentation

Chair: Andrea Calì
9.30 - 10.30 Invited Talk:
Can existing Semantic Web technologies support Semantic Search and Knowledge Discovery?
Maria-Esther Vidal (Computer Science department of the University of Bonn, Germany)
Abstract: Over the last decade, Semantic Web initiatives have encouraged the publication of a large number of publicly available data sources using the Resource Description Framework (RDF) data model. Semantic Web technologies such as federated query engines and ontology-based tools have become building blocks for a wide range of multidisciplinary applications that not only require to access RDF data, but also to reason based on the meaning of this data. In this talk, I will describe exemplar Semantic Web technologies and our work on query processing and data discovery tools; further, the role that these tools play on semantic search and knowledge discovery will be discussed. On the query processing side, adaptive and flexible query engines able to adjust execution schedulers to current conditions of the data sources will be described. The behavior of these query processing techniques will be illustrated in real-world RDF data sources and state-of-the-art benchmarks. At the level of knowledge discovery, graph partitioning techniques that exploit the meaning of the data will be presented with a focus on knowledge discovery methods that rely on the output produced by these graph partitioning tools to identify hidden associations among entities in a knowledge graph. The quality of these techniques will be illustrated in the Biomedical domain, where unknown associations between drugs and genes have been discovered. Finally, I will discuss the role of query processing and knowledge discovery techniques in semantic search problems, and the issues that remain open that need to be addressed to provide efficient and effective solutions to these problems.

Session II: Information Extraction and Retrieval

10.50 - 12.40
Chair: Julian Szymanski

Colin Layfield, Joel Azzopardi and Chris Staff. Experiments with Document Retrieval from Small Text Collections using Latent Semantic Analysis or Term Similarity with Query Coordination and Automatic Relevance Feedback.
Philipp Ludwig, Marcus Thiel and Andreas Nürnberger. Unsupervised Extraction of Conceptual Keyphrases from Abstracts.
Joel Azzopardi, Fabio Benedetti, Francesco Guerra and Mihai Lupu. Back to the sketch-board: Integrating keyword search, semantics, and information retrieval.
<abstract>      <slides>

Laura Po, Federica Rollo and Raquel Trillo Lado. Topic detection in multi-channel Italian newspapers
<abstract>      <slides>

Serwah Sabetghadam, Mihai Lupu and Andreas Rauber. Random Walks Analysis on Graph Modelled Multimodal Collections
<abstract>      <slides>

Session III

14.00 - 15.30
Chair: Andrea Calì

Taming Large Answers to Keyword Queries
Dan Olteanu (University of Oxford, UK)
Abstract: Despite their wide adoption in industry, structured databases still pose difficulties to a large base of their users who are not familiar with database query languages and complex schemas. A flurry of efforts in academia and industry aim at bridging this gap by supporting simple keyword queries much in spirit with access methods for unstructured web data. Keyword queries tend however to be less precise than SQL queries written by experts and as a consequence the sheer amount of their answers may easily overwhelm the users.
In this talk, I will overview an approach that factorizes the representation and computation of large query answers. The key insight is that factorization avoids redundancy in the representation and computation of query answers and may require orders of magnitude less space and time. Furthermore, factorized representations of query answers may be more intuitive to users than standard listing as they explicitly highlight cluster relationships between values within and across columns. Factorization can also benefit subsequent processing: Aggregates and complex analytics, such as learning regression models, can be performed in one pass over factorized query answers and thus inherit the good performance of factorized computation.

ETI - the Recognos Smart Data Platform
Mihai Dinsoreanu (Chief Technology Officer at Recognos Cluj-Napoca)
Bio/Abstract: Mihai is Chief Technology Officer at Recognos Cluj-Napoca (http://www.recognos.com) and brings over 20 years of experience working in technology companies, leading large, cross-border software development projects across Europe and the US. Mihai is a leader within the software industry, and an associate professor at the Technical University in Cluj. His area of expertise includes Software Architecture and Software Engineering with a focus on cloud and distributed applications. He has a Ph.D. degree in Computer Science from Technical University of Cluj.
Recognos is offering the ETI (Extract, Transform, Integrate) smart data platform that enables users to seamlessly use and integrate both structured and unstructured data. The platform is used in the Data Preparation process for Data Lakes, Data Analytics and Knowledge Discovery applications. The main goal of the platform is to normalize and integrate the data that is stored in unstructured, semi-structured and structured content. It uses a set of different techniques and technologies to extract data from the unstructured content and map it to the semi-structured and structured content. The data unification process is driven by the business ontology that is the central pivot for the data extraction taxonomy, semantic tagging and structured data mapping. The ultimate goal is to allow to use the data that is stored in the un-structured, semi-structured and structured content using a unique semantic meaning and unique query language.

Session IV: Text and Digital Libraries

16.00 - 17.30
Chair: Gilles Falquet

Javier Lacasta, Gilles Falquet, Javier Nogueras Iso and Javier Zarazaga-Soria. A software processing chain for evaluating thesaurus quality.
<abstract>      <slides>

Joel Azzopardi, Dragan Ivanovic and Georgia Kapitsaki. Comparison of Collaborative and Content-based Automatic Recommendation Approaches in a Digital Library of Serbian PhD Dissertations.
Ranka Stankovic, Cvetana Krstev, Dusko Vitas, Nikola Vulovic and Olivera Kitanovic. Keyword-based search on bilingual digital libraries.
Slobodan Beliga and Sanda Martincic-Ipsic. Network-Enabled Keyword Extraction for Under-Resourced Languages.
<abstract>      <slides>

Session V

9.00 - 10.30
Chair: Florin Pop

Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Stefan Dietze (L3S Research Center, University of Hannover, DE)
Abstract: While the Web of (entity-centric) data, consisting of Linked Data and knowledge graphs, has seen tremendous growth over the past years, take-up and reuse is still limited. Datasets vary heavily with respect to characteristics such as their scale, quality, coverage or the dynamics, what poses challenges for tasks such as entity retrieval or search. This talk will provide an overview of approaches to deal with the increasing dynamics and heterogeneity of Web data. In the first part, approaches for focused crawling, linking, profiling and retrieval are discussed, as means to enable discovery and search of entity-centric data in the Web of Linked Data. In the second part, we will turn towards embedded markup, such as Microdata and RDFa, as a novel source of entity-centric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org and being already adopted by 30% of all Web pages, it constitutes an increasingly important source of entity-centric Web data, which is growing and evolving at the same order of magnitude as the Web itself. We will present some case studies and ongoing work on data fusion from markup data for aiding tasks such as entity retrieval and knowledge base augmentation. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs.

VRE4EIC - A Europe-wide Interoperable Virtual Research Environment to Empower Multidisciplinary Research Communities (http://www.vre4eic.eu/)
Dragan Ivanovic (University of Novi Sad, Serbia)
Abstract: VRE4EIC develops a reference architecture and software components for VREs (Virtual Research Environments). This e-VRE bridges across existing e-RIs (e-Research Infrastructures) such as EPOS and ENVRI+, both represented in the project, themselves supported by e-Is (e-Infrastructures) such as GEANT, EUDAT, PRACE, EGI, OpenAIRE. The e-VRE provides a comfortable homogeneous interface for users by virtualising access to the heterogeneous datasets, software services, resources of the e-RIs and also provides collaboration/communication facilities for users to improve research communication. Finally, it provides access to research management /administrative facilities so that the end-user has a complete research environment.


Session VI

11.00 - 12.40
Chair: Javier Lacasta

Providing Enterprise-Ready Big Data Platform: Lessons Learned and Foresights
Radu Tudoran (Huawei Research Engineer - Big Data Expert IT R&D Division)
Abstract: Big Data domain receives much attention nowadays, with large investments carried both in the private and public sectors. Investments span from building new business services for monetizing the data, to addressing the community needs or improving the life of individuals. Sustaining such innovations across various verticals requires technical and research advancements at the level of the Big Data platform. A great deal of work happens in the open source Hadoop ecosystem, driven by the community and freely available to anyone. However, collecting large volumes of data, storing them persistently, providing SLA guarantees for highly reliable computation and persistent storage, are just several aspects that can require more than a vanilla platform, thus requiring an enterprise-ready Big Data platform. In this talk, we will present such enhancements for an enterprise-ready Big Data platform, and lessons learned from various successfully use cases and projects. Also, the talk will discuss trends and insights about the future of the Big Data platforms.

Documents, Semantics and Information Retrieval.

Xenia Koulouri, Claudia Ifrim, Manolis Wallace and Florin Pop. Making sense of citations.
<abstract>      <slides>

Andrea Calì and Ana Mestrovic. An Ontology-based Approach to Information Retrieval.

Session VII

14.00 - 15.30
Chair: Ranka Stankovic

Tomasz Boinski. Game with a Purpose for Verification of Mappings Between Wikipedia and WordNet.
<abstract>      <slides>

Peter Macko and Viera Rozinajova. Using Natural Language to Search Linked Data.
<abstract>      <slides>

Vagan Terziyan, Mariia Golovianko and Michael Cochez. TB-Structure: Collective Intelligence for Exploratory Keyword Search.
<abstract>      <slides>

Stavroula Bampatzia, Omar Gustavo Bravo-Quezada, Angeliki Antoniou, Martin Lopez Nores, Manolis Wallace, George Lepouras and Costas Vasilakis. The use of semantics in the CrossCult H2020 project.
<abstract>      <slides>

Session VIII

16.00 - 17.30
Panel: Bringing Big Data Analytics to Small & Medium enterprises: The role of Big Data, and Cloud Computing in Searching and Understanding Large Volumes of Enterprise Data.
Moderator: Yannis Velegrakis (University of Trento)

    * Vagan Terziyan (University of Jyvaskyla)
    * Radu Tudoran (HUAWEI’s European Research Centre)