Combining Bag-of-Words and Bag-of-Concepts Representations for Arabic Text Classification

Authors: Alaa, Alahmadi; Arash, Joorabchi; Abdulhussain, E. Mahdi
Year: 2014
Venue: In Proceedings of the 25th IET Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014), pp. 343-348, Limerick, Ireland, 26-27 June 2014.
Product of the Action: No

This paper introduces a set of new approaches for text representation for automatic classification of Arabic textual documents. These approaches are based on combining the well-known Bag-of-Words (BOW) and the Bag-of-Concepts (BOC) text representation schemes and utilizing Wikipedia as a knowledge base. The proposed representations are used to generate a vector space model, which in turn is fed into a classifier to categorize a collection of Arabic textual documents. Three different machine learning based classifiers have been utilized in this work. Performance of proposed text representation models is evaluated in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representation schemes that are based on augmenting the standard BOW with the BOC.