Extraction of keyphrases from single document based on hierarchical concepts

Authors: Miroslav Smatana, Peter Butka
Year: 2016
Venue: IEEE 14th International Symposium on Applied Machine Intelligence and Informatics, 93-98
Link: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7422988&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7422988
Product of the Action: No

Abstract:
In this paper we provide modification of approaches for extraction of keyphrases from single textual document (without external information) based on the hierarchical concepts created upon the text of particular document. For the creation of hierarchical concepts method from area of Formal Concept Analysis (FCA) is used, which organizes objects into concept lattice (structure of hierarchically organized clusters known as formal concepts) based on the similarity of their attributes. In our case FCA is applied as follows. Input document is preprocessed, extracted objects are sentences or paragraphs and attributes are frequencies of terms in particular objects. For this input data conceptual model is created using FCA-based algorithm known as generalized one-sided concept lattice. Hierarchical concepts from this model are used for extraction of keyphrases for document. Our approach is experimentally tested on selected manually annotated documents and compared to standard keyphrase extraction methods, which beneficially improved their results thanks to usage of hierarchical concepts from concept-based analysis.