A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification

Authors: Alaa, Alahmadi; Arash, Joorabchi; Abdulhussain, E. Mahdi
Year: 2013
Venue: In Proceedings of the 7th IEEE GCC Conference and Exhibition (IEEE GCC 2013), pp.108,113, Doha, Qatar, 17-20 November, 2013.
Link: http://dx.doi.org/10.1109/IEEEGCC.2013.6705759
Product of the Action: No

This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations.