The Impact of Feature Extraction and Selection on SMS Spam Filtering

Authors

  • A. K. Uysal Anadolu University
  • S. Gunal Anadolu University
  • S. Ergin Eskisehir Osmangazi University
  • E. Sora Gunal Eskisehir Osmangazi University

DOI:

https://doi.org/10.5755/j01.eee.19.5.1829

Keywords:

Feature extraction, feature selection, SMS, spam filter

Abstract

This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language.

DOI: http://dx.doi.org/10.5755/j01.eee.19.5.1829

Author Biographies

A. K. Uysal, Anadolu University

Department of Computer Engineering

S. Gunal, Anadolu University

Department of Computer Engineering

S. Ergin, Eskisehir Osmangazi University

Department of Electrical and Electronics Engineering

E. Sora Gunal, Eskisehir Osmangazi University

Department of Electrical and Electronics Engineering

Downloads

Published

2013-05-01

How to Cite

Uysal, A. K., Gunal, S., Ergin, S., & Sora Gunal, E. (2013). The Impact of Feature Extraction and Selection on SMS Spam Filtering. Elektronika Ir Elektrotechnika, 19(5), 67-72. https://doi.org/10.5755/j01.eee.19.5.1829

Issue

Section

SYSTEM ENGINEERING, COMPUTER TECHNOLOGY