Loading…

A new feature selection method based on frequent and associated itemsets for text classification

SummaryFeature selection is one of the major issues in pattern recognition. The quality of selected features is important for classification as the low‐quality data can degrade the model construction performance. Due to the difficulty of dealing with the problem that selected features always contain...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2022-11, Vol.34 (25), p.n/a
Main Authors: Mamdouh Farghaly, Heba, Abd El‐Hafeez, Tarek
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:SummaryFeature selection is one of the major issues in pattern recognition. The quality of selected features is important for classification as the low‐quality data can degrade the model construction performance. Due to the difficulty of dealing with the problem that selected features always contain redundant information, this article focuses on the association analysis theory in data mining to select important features. In this study, a novel feature selection method based on frequent and associated itemsets (FS‐FAI) for text classification is proposed. FS‐FAI seeks to find relevant features and also takes feature interaction into account. Moreover, it uses association as a metric to evaluate the relativity between the target concept and feature(s). To evaluate the efficacy of the proposed method, several experiments were conducted on a BBC dataset from the BBC news website and SMS spam collection dataset from the UCI machine learning repository. The obtained results were compared to well‐known feature selection methods. The reported results demonstrated the effectiveness of the proposed feature selection method in selecting high‐quality features and in handling redundant information in text classification.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.7258