Loading…
Rough set model based feature selection for mixed-type data with feature space decomposition
•Interpretability of the feature selection for mixed-type data is increased.•Any transforming procedure is not needed on categorical and numerical features.•FSMSD selects features that are not biased by any data-type.•FSMSD and benchmark methods are compared with 15 mixed-type data. Feature selectio...
Saved in:
Published in: | Expert systems with applications 2018-08, Vol.103, p.196-205 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Interpretability of the feature selection for mixed-type data is increased.•Any transforming procedure is not needed on categorical and numerical features.•FSMSD selects features that are not biased by any data-type.•FSMSD and benchmark methods are compared with 15 mixed-type data.
Feature selection plays an important role in the classification problems associated with expert and intelligent systems. The central idea behind feature selection is to identify important input features in order to reduce the dimensionality of the input space while maintaining or improving classification performance. Traditional feature selection approaches were designed to handle either categorical or numerical features, but not the mix of both that often arises in real datasets. In this paper, we propose a novel feature selection algorithm for classifying mixed-type data, based on a rough set model, called feature selection for mixed-type data with feature space decomposition (FSMSD). This can handle both categorical and numerical features by utilizing rough set theory with a heterogeneous Euclidean-overlap metric, and can be applied to mixed-type data. It also uses feature space decomposition to preserve the properties of multi-valued categorical features, thereby reducing information loss and preserving the features’ physical meaning. The proposed algorithm was compared with four benchmark methods using real mixed-type datasets and biomedical datasets, and its performance was promising, indicating that it will be helpful to users of expert and intelligent systems. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2018.03.010 |