Loading…

The Uniformization and the Feature Selection about the Inconsistent Classification Data Set

The inconsistency and redundant attributes of a sample data set will drop the classification quality and efficiency. In this paper, the method that can make the classification data set consistent and select a smallest feature variable set is proposed. This method groups together the inconsistent dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin-ling Wu, Dong-feng He, Guo-qiang Zhou
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The inconsistency and redundant attributes of a sample data set will drop the classification quality and efficiency. In this paper, the method that can make the classification data set consistent and select a smallest feature variable set is proposed. This method groups together the inconsistent datum of the most likely category to make the data set uniform, based on Bayesian formula. Then with the uniform data set, a category distinction matrix is built and the smallest feature variable subset that can distinguish the category accurately is obtained through the category distinction matrix. A heuristic search strategy is given to select the feature variables. The experiment results using some UCI standard datasets show the proposed method can eliminate the inconsistency of the sample dataset, select the optimal feature variables and drop the dimension of the data effectively.
ISSN:2155-6083
2155-6091
DOI:10.1109/GCIS.2009.299