Spring 2014 MC and WG Meetings


“Semantic keyword search in big data”


1. Topic

Since large-scale data sources are usually comprised of a very large schema and billions of instances, keyword search over such datasets can suffer from query ambiguity and scalability challenges. The discovery of suitable, i.e. semantically-related data sources, is another critical issue, hindered by the lack of sufficient information on available datasets and endpoints. Browsing and searching data on such a scale is not an easy task for users. Semantic search aims at leveraging semantics to improve the accuracy and recall of search mechanisms. Whereas state-of-the-art keyword search techniques work well for small or medium-sized databases in a particular domain, many of them fail to scale on heterogeneous databases that are composed of several thousand tables.

2. Goal

  1. [RESEARCH] Definition of open challenges in keyword search over structured databases, with particular reference to large and heterogeneous sources (by means of keynote talks and round table discussions
  2. [NETWORKING] Identification of research topics studied by research groups involved in KEYSTONE
  3. [TASKS] Analysis of the H2020 calls of interest


3. Program

March 24

16.30-18.30 [NETWORKING] Session I – Presentation Session

19.00-22.00 [NETWORKING] Dinner

March 25

9.00-10.15 [TASKS] Session II – H2020 opportunities

10.15-10.45 Coffee Break

10.45-13.00 [RESEARCH] Session III – Research topics in Big Data

13.00-14.00 Lunch

14.00-15.15 [RESEARCH] Session IV – Panel: Evaluation of Keyword Search Systems

15.15-15.30 Coffee Break

15.30-17.00 [NETWORKING] Session V – Defining Open Challenges in Keyword-based Search in Big Data

17-17.30 Conclusions



Session I: Presentation Session [March 24: 16:30-18:30]

Chair:Francesco Guerra

Speaker(s): Francesco Guerra, Jorge Cardoso and … all the attendants


  1. Presentation of the network, what it means to participate in a COST action, activities, what we did, what we plan to do. [20 minutes]
  2. Presentation Session: The participating units will be asked to presents their groups through 1-2 slides [1 hour]
  3. Brainwriting session “Open challenges in Keyword-based Search Systems in Big Data” – part 1: The attendants will be required to participate in a session for defining the open challenges in the area [30 minutes]


Session II: H2020 calls about big data [March 25: 09:00-10:15]


  • Live presentation from ICT H2020 Team
  • Bert van Werkhoven, NL  Agentschapps contact for the ICT program in Horizon 2020

Goal: Presentation of H2020 calls about big data


Session III: Research Challenges in Big Data [March 25: 10:45-13:00]


  • Keynote 1:  Edgar Meij: Web-scale semantic search at Yahoo
  • Keynote 2:  Pedro Furtado: Scalability and Realtime for BigData
  • Keynote 3: Djoerd Hiemstra: Federated search for real: combining 150 search engines, and counting

Goal: Identifying the main open challenges in the area: 3 keynotes trying to identify the challenges plus discussion with the audience.


Session IV Panel: The evaluation of keyword search systems [March 25: 14:00-15:15]

Chair: Nicola Ferro (University of Padua):


  • Maarten de Rijke University of Amsterdam (NL)
  • Claudia Hauff TU Delft (NL)
  • Martin Theobald University of Antwerp (BE)
  • Arjen de Vries CWI Amsterdam (NL)

Session V Brainwriting:  Open Challenges in Keyword based search in Big Data [March 25: 15:30-17:00]

Chair: Jorge Cardoso

Goal: The result of the brainwriting session Part 1 will be analyzed and discussed with the attendants. The goal is the “creation” of a white paper of the open issue in “Semantic keyword-based search in big data”

program – under construction (only WG chairs can modify, everyone can view), invited people – under construction


4. Venue and Accomodation

Location: Hampshire Hotel Fitland – Level Leiden

Arriving by plane or train: Amsterdam Schiphol Airport is at approximately 30 km from the centre of Leiden. There are regular trains (4 per hour) connecting the airport to Leiden Central Station (€ 5.40 for a one way trip taking about 20 minutes). See here for details.

Accommodations: list of some hotels

Note:  An important event (The Nuclear Security Summit) has been planned in the Hague on 24 and 25 March. This means that traveling by car between Amsterdam, Schiphol Airport, Noordwijk, Leiden, the Hague on those days will be very difficult. This is also the case on the 23rd March when there will be security checks on those tracks. It is expected that there won’t be delays when traveling by train. Some information are available at https://www.nss2014.com/en

If you are going to arrive by car: An alternative for people from abroad that travel by car on 24-25/03 will be to park near the train station of Alphen aan den Rijn (also spelled as Alphen a/d Rijn) at the Q Park, Dr. A.D. Sacharovlaan (this is the name of the street). Parking for 2 days is €11.50. From Alphen a/d Rijn a train ride to Leiden Centraal takes 15 minutes and costs € 3.30, one way. From Alphen a/d Rijn the train station Leiden Centraal is the second train stop in Leiden, the first one is Leiden Lammenschans. Depending the time of the day there are 2 or 4 trains an hour.

It’s possible to buy parking tickets in advance but all information on Q Park and their webshop is in Dutch:


Free parking at a train station is available at the same train line, one stop before Alphen a/d Rijn. This is in the town of Bodegraven, next to the motorway A12. However, these parking places are not indoors and unattended.  At Bodegraven follow P+R Stationsplein from the motorway. A train ticket Bodegraven-Leiden Centraal is € 4.60, one way.

5. How to claim for a reimburs (only for invited people):

  1. You have to be registered in the eCost system
  2. Summary of eligible expenses
  3. IMPORTANT Note about how to avoid tax deductions





(credits Antonio Farina and Paolo Missier)