WG MEETING
“Semantic keyword search in big data”
1. Topic
Since large-scale data sources are usually comprised of a very large schema and billions of instances, keyword search over such datasets can suffer from query ambiguity and scalability challenges. The discovery of suitable, i.e. semantically-related data sources, is another critical issue, hindered by the lack of sufficient information on available datasets and endpoints. Browsing and searching data on such a scale is not an easy task for users. Semantic search aims at leveraging semantics to improve the accuracy and recall of search mechanisms. Whereas state-of-the-art keyword search techniques work well for small or medium-sized databases in a particular domain, many of them fail to scale on heterogeneous databases that are composed of several thousand tables.
2. Goal
- [RESEARCH] Definition of open challenges in keyword search over structured databases, with particular reference to large and heterogeneous sources (by means of keynote talks and round table discussions
- [NETWORKING] Identification of research topics studied by research groups involved in KEYSTONE
- [TASKS] Analysis of the H2020 calls of interest
3. Program
March 24
16.30-18.30 [NETWORKING] Session I – Presentation Session
19.00-22.00 [NETWORKING] Dinner
March 25
9.00-10.15 [TASKS] Session II – H2020 opportunities
10.15-10.45 Coffee Break
10.45-13.00 [RESEARCH] Session III – Research topics in Big Data
13.00-14.00 Lunch
14.00-15.15 [RESEARCH] Session IV – Panel: Evaluation of Keyword Search Systems
15.15-15.30 Coffee Break
15.30-17.00 [NETWORKING] Session V – Defining Open Challenges in Keyword-based Search in Big Data
17-17.30 Conclusions
Details
Session I: Presentation Session [March 24: 16:30-18:30]
Chair:Francesco Guerra
Speaker(s): Francesco Guerra, Jorge Cardoso and … all the attendants
Goal:
- Presentation of the network, what it means to participate in a COST action, activities, what we did, what we plan to do. [20 minutes]
- Presentation Session: The participating units will be asked to presents their groups through 1-2 slides [1 hour]
- Brainwriting session “Open challenges in Keyword-based Search Systems in Big Data” – part 1: The attendants will be required to participate in a session for defining the open challenges in the area [30 minutes]
Session II: H2020 calls about big data [March 25: 09:00-10:15]
Speaker(s):
- Live presentation from ICT H2020 Team
- Bert van Werkhoven, NL Agentschapps contact for the ICT program in Horizon 2020
Goal: Presentation of H2020 calls about big data
Session III: Research Challenges in Big Data [March 25: 10:45-13:00]
Speaker(s):
- Keynote 1: Edgar Meij: Web-scale semantic search at Yahoo
- Keynote 2: Pedro Furtado: Scalability and Realtime for BigData
- Keynote 3: Djoerd Hiemstra: Federated search for real: combining 150 search engines, and counting
Goal: Identifying the main open challenges in the area: 3 keynotes trying to identify the challenges plus discussion with the audience.
Session IV Panel: The evaluation of keyword search systems [March 25: 14:00-15:15]
Chair: Nicola Ferro (University of Padua):
Participants:
- Maarten de Rijke University of Amsterdam (NL)
- Claudia Hauff TU Delft (NL)
- Martin Theobald University of Antwerp (BE)
- Arjen de Vries CWI Amsterdam (NL)
Session V Brainwriting: Open Challenges in Keyword based search in Big Data [March 25: 15:30-17:00]
Chair: Jorge Cardoso
Goal: The result of the brainwriting session Part 1 will be analyzed and discussed with the attendants. The goal is the “creation” of a white paper of the open issue in “Semantic keyword-based search in big data”
program – under construction (only WG chairs can modify, everyone can view), invited people – under construction
4. Venue and Accomodation
Location: Hampshire Hotel Fitland – Level Leiden
Arriving by plane or train: Amsterdam Schiphol Airport is at approximately 30 km from the centre of Leiden. There are regular trains (4 per hour) connecting the airport to Leiden Central Station (€ 5.40 for a one way trip taking about 20 minutes). See here for details.
Accommodations: list of some hotels
Note: An important event (The Nuclear Security Summit) has been planned in the Hague on 24 and 25 March. This means that traveling by car between Amsterdam, Schiphol Airport, Noordwijk, Leiden, the Hague on those days will be very difficult. This is also the case on the 23rd March when there will be security checks on those tracks. It is expected that there won’t be delays when traveling by train. Some information are available at https://www.nss2014.com/en
If you are going to arrive by car: An alternative for people from abroad that travel by car on 24-25/03 will be to park near the train station of Alphen aan den Rijn (also spelled as Alphen a/d Rijn) at the Q Park, Dr. A.D. Sacharovlaan (this is the name of the street). Parking for 2 days is €11.50. From Alphen a/d Rijn a train ride to Leiden Centraal takes 15 minutes and costs € 3.30, one way. From Alphen a/d Rijn the train station Leiden Centraal is the second train stop in Leiden, the first one is Leiden Lammenschans. Depending the time of the day there are 2 or 4 trains an hour.
It’s possible to buy parking tickets in advance but all information on Q Park and their webshop is in Dutch:
http://www.q-park.nl/nl/parkeren-bij-q-park/per-stad/alphen-a/d-rijn/p-r-sacharovlaan
Free parking at a train station is available at the same train line, one stop before Alphen a/d Rijn. This is in the town of Bodegraven, next to the motorway A12. However, these parking places are not indoors and unattended. At Bodegraven follow P+R Stationsplein from the motorway. A train ticket Bodegraven-Leiden Centraal is € 4.60, one way.
5. How to claim for a reimburs (only for invited people):
- You have to be registered in the eCost system
- Summary of eligible expenses
- IMPORTANT Note about how to avoid tax deductions
MATERIAL
- Presentation of the participating members / teams
- Brain-Writing Session. Result I and result II
Photos
(credits Antonio Farina and Paolo Missier)