Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus

Authors: Ludovic Moncla, Walter Renteria-Agualimpia, Javier Nogueras-Iso, Mauro Gaio
Year: 2014
Venue: Proceedings of 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2014), Dallas, Texas, USA, 4-7 November 2014.
Product of the Action: Yes

Keystone Members Authors:

Geoparsing and geocoding are two essential middleware services to facilitate final user applications such as location-aware searching or different types of location-based services. The objective of this work is to propose a method for establishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly linked with space and with a frequent use of fine-grain toponyms. The geoparsing part is a Natural Language Processing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transducers). However, the real novelty of this work lies in the geocoding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The feasibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.