2nd International KEYSTONE Conference
IKC 2016
Cluj-Napoca Romania, 8-9 September 2016
Home
Organization
Conference Structure / Calls
Research Track
Demo track
Challenge track
Program
Program
Accepted Papers
Venue
The conference venue
The city
Travel information
Accommodation
About
About Keystone
About Cost
Contact
Research track
Joel Azzopardi, Dragan Ivanovic and Georgia Kapitsaki.
Comparison of Collaborative and Content-based Automatic Recommendation Approaches in a Digital Library of Serbian PhD Dissertations
.
Abstract:
Digital libraries have become an excellent information resource for researchers. However, users of digital libraries would be served better by having the relevant items 'pushed' to them. In this research, we present various automatic recommendation systems to be used in a digital library of Serbian PhD Dissertations. We experiment with the use of Latent Semantic Analysis (LSA) in both content and collaborative recommendation approaches, and evaluate the use of different similarity functions. We nd that the best results are obtained when using a collaborative approach that utilises LSA and Pearson similarity.
Colin Layfield, Joel Azzopardi and Chris Staff.
Experiments with Document Retrieval from Small Text Collections using Latent Semantic Analysis or Term Similarity with Query Coordination and Automatic Relevance Feedback
.
Abstract:
One of the problems faced by users of databases containing textual documents is the difficulty in retrieving relevant results due to the diverse vocabulary used in queries and contained in relevant documents, especially when there are only a small number of relevant documents. This problem is known as the Vocabulary Gap. The PIKES team have constructed a small test collection of 331 articles extracted from a blog and a Gold Standard for 35 queries selected from the blog's search log so the results of different approaches to semantic search can be compared. So far, prior approaches include recognising Named Entities in documents and queries, and relations including temporal relations, and represent them as `semantic layers' in a retrieval system index. In this work, we take two different approaches that do not involve Named Entity Recognition. In the first approach, we process an unannotated version of the PIKES document collection using Latent Semantic Analysis and use a combination of query coordination and automatic relevance feedback with which we outperform prior work. However, this approach is highly dependent on the underlying collection, and is not necessarily scalable to massive collections. In our second approach, we use an LSA Model generated by SEMILAR from a Wikipedia dump to generate a Term Similarity Matrix (TSM). We automatically expand the queries in the PIKES test collection with related terms from the TSM and submit them to a term-by-document matrix derived by indexing the PIKES collection using the Vector Space Model. Coupled with a combination of query coordination and automatic relevance feedback we also outperform prior work with this approach. The advantage of the second approach is that it is independent of the underlying document collection.
Philipp Ludwig, Marcus Thiel and Andreas Nürnberger.
Unsupervised Extraction of Conceptual Keyphrases from Abstracts
.
Abstract:
The extraction of meaningful keyphrases is important for a variety of applications, such as recommender systems, solutions for browsing of literature, or automatic categorization of documents. Since this task is not trivial, a great amount of different approaches have been introduced in the past, either focusing on single aspects of the process or utilizing the characteristics of a certain type of document. Especially when it comes to supporting the user in grasping the topics of a document (i.e. in the display of search results), precise keyphrases can be very helpful. However, in such situations usually only the abstract or a short excerpt is available, which most approaches do not acknowledge. Methods based on the frequency of words are not appropriate in this case, since the short texts do not contain sufficient word statistics for a frequency analysis. Secondly, many existing methods are supervised and therefore depend on domain knowledge or manually annotated data, which is in many scenarios not available. Therefore we present an unsupervised graph-based approach for extracting meaningful keyphrases from abstracts of scientific articles. We show that even though our method is not based on manually annotated data or corpora, it works surprisingly well.
Joel Azzopardi, Fabio Benedetti, Francesco Guerra and Mihai Lupu.
Back to the sketch-board: Integrating keyword search, semantics, and information retrieval
.
Abstract:
We reproduce recent research results combining semantic and information retrieval methods. Additionally, we expand the existing state of the art by combining the semantic representations with IR methods from the probabilistic relevance framework. We demonstrate a significant increase in performance, as measured by standard evaluation metrics.
Laura Po, Federica Rollo and Raquel Trillo Lado.
Topic detection in multi-channel Italian newspapers
.
Abstract:
Nowadays, any person, company or public institution uses and exploits different channels to share private or public information with other people (friends, customers, relatives, etc.) or institutions. This context has changed the journalism, thus, the major newspapers report news not just on its own web site, but also on several social media such as Twitter or YouTube. The use of multiple communication media stimulates the need for integration and analysis of the content published globally and not just at the level of a single medium. An analysis to achieve a comprehensive overview of the information that reaches the end users and how they consume the information is needed. This analysis should identify the main topics in the news flow and reveal the mechanisms of publication of news on different media (e.g. news timeline). Currently, most of the work on this area is still focused on a single medium. So, an analysis across different media (channels) should improve the result of topic detection. This paper shows the application of a graph analytical approach, called keygraph, to a set of very heterogeneous documents such as the news published on various media. A preliminar evaluation on the news pubblished in a 5 days period was able to identify the main topics within the pubblications of a single newspaper, and also within the publications of 20 newspapers on several online channels.
Serwah Sabetghadam, Mihai Lupu and Andreas Rauber.
Random Walks Analysis on Graph Modelled Multimodal Collections
.
Abstract:
Nowadays, there is a proliferation of information objects from different modalities---Text, Image, Audio, Video. Different types of relations between information objects (e.g. similarity or semantic) has motivated graph-based search in multimodal Information Retrieval. In this paper, we formulate a Random Walks problem along our model for multimodal IR, that is robust over different distributions of modalities. We investigate query-dependent and query-independent Random Walks on our model. The results show that the query-dependent Random Walks provides higher precision value than query-independent Random Walks. We additionally investigate the contribution of the graph structure (quantified by the number and weights of incoming and outgoing links) to the final ranking in both types of Random Walks. We observed that query-dependent Random Walks is less dependent on the graph structure. The experiments are applied on a multimodal collection with about 400,000 documents and images.
Tomasz Boinski.
Game with a Purpose for Verification of Mappings Between Wikipedia and WordNet
.
Abstract:
The paper presents a Game with a Purpose for verification of automatically generated mappings focusing on mappings between WordNet synsets and Wikipedia articles. General description of idea standing behind the games with the purpose is given. Description of TGame system, a 2D platform mobile game with verification process included in the game-play, is provided. Additional mechanisms for anti-cheating, increasing player’s motivation and gathering feedback are also presented. The evaluation of proposed solution and future work is also described.
Javier Lacasta, Gilles Falquet, Javier Nogueras Iso and Javier Zarazaga-Soria.
A software processing chain for evaluating thesaurus quality
.
Abstract:
Thesauri are knowledge models commonly used for information classication and retrieval whose structure is dened by standards that describe the main features the concepts and relations must have. However, following these standards requires a deep knowledge of the field the thesaurus is going to cover and experience in their creation. To help in this task, this paper describes a software processing chain that provides dierent validation components that evaluates the quality of the main thesaurus features.
Xenia Koulouri, Claudia Ifrim, Manolis Wallace and Florin Pop.
Making sense of citations
.
Abstract:
To this day the analysis of citations has been aimed mainly to the exploration of different ways to count them, such as the total count, the h-index or the s-index, in order to quantify a researcher's overall contribution and impact. In this work we show how the consideration of the structured metadata that accompany citations, such as the publication outlet in which they have appeared, can lead to a considerably more insightful understanding of the ways in which a researcher has impacted the work of others.
Ranka Stankovic, Cvetana Krstev, Dusko Vitas, Nikola Vulovic and Olivera Kitanovic.
Keyword-based search on bilingual digital libraries
.
Abstract:
This paper outlines the main features of Biblisa, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel texts residing in bilingual digital library. Bibliša supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic dictionaries, SQL and NoSQL databases, which are distributed in different servers accessed in various ways. The web application has been tested on a collection of texts from 3 journals and 2 projects, comprising 299 documents generated from TMX, stored in a NoSQL database. The tool allows the full-text and metadata search, with extraction of concordance sentence pairs for translation and terminology work support.
Peter Macko and Viera Rozinajova.
Using Natural Language to Search Linked Data
.
Abstract:
There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.
Vagan Terziyan, Mariia Golovianko and Michael Cochez.
TB-Structure: Collective Intelligence for Exploratory Keyword Search
.
Abstract:
This paper represents a search optimization technique dealing with large-scale content and the open-world assumption. The aim is to increase search effectiveness predicting implicit seeker’s intents at early stages of the search process. It`s done by uncovering behavioral patterns in large sets of previously collected data describing collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s intents.
Slobodan Beliga and Sanda Martincic-Ipsic.
Network-Enabled Keyword Extraction for Under-Resourced Languages
.
Abstract:
In this paper we discuss advantages of network-enabled keyword extraction from texts in under-resourced languages. Network-enabled methods are introduced in short, while focus of the paper is placed on discussion of difficulties that methods must overcome when dealing with content in under-resourced languages (mainly exhibit as a lack of natural language processing resources: corpora and tools). Additionally, the paper discusses how to circumvent the lack of NLP tools with network-enabled method such is SBKE method.
Stavroula Bampatzia, Omar Gustavo Bravo-Quezada, Angeliki Antoniou, Martin Lopez Nores, Manolis Wallace, George Lepouras and Costas Vasilakis.
The use of semantics in the CrossCult H2020 project
.
Abstract:
CrossCult is a newly started project that aims to make reflective history a reality in the European cultural context. In this paper we examine how the project aims to take advantage of advances in semantic technologies in order to achieve its goals. Specifically, we see what the quest for reflection is and, through practical examples from two of the project's flagship pilots, explain how semantics can assist in this direction.
Home
|
Organization
|
Conference Structure / Calls
|
Program
|
Venue
|
About