Loading…

A novel SMOTE-based resampling technique trough noise detection and the boosting procedure

•Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.•The number of links in SMOTE is vaguely selected and same for every observation.•We propose a new noise detection method to be applied before SMOTE to prevent noise generation.•We a...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2022-08, Vol.200, p.117023, Article 117023
Main Authors: Sağlam, Fatih, Cengiz, Mehmet Ali
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.•The number of links in SMOTE is vaguely selected and same for every observation.•We propose a new noise detection method to be applied before SMOTE to prevent noise generation.•We also propose a new approach to select the number of links automatically in SMOTE.•Proposed SMOTEWB method outperforms SMOTE in linear and nonlinear classifiers in presence of noise. Most of the classification methods assume that the numbers of class observations are balanced. In such cases, models are predicted by giving biased weight to the the class with more observations. Therefore, the classifiers ignore the class with smaller number of observations and the majority class makes biased predictions. There are some advised performance measures to be used in datasets, as well as recommended approaches to solve class imbalance problem. One of the most widely used methods is resampling method. In this study, the difficulties relevant to random oversampling (ROS) and synthetic minority oversampling technique (SMOTE), which are some of the oversampling methods, are discussed. This study aims to propose a combination of a new noise detection method and SMOTE to overcome those difficulties. Using the boosting procedure in ensemble algorithms, noise detection is possible with the proposed SMOTE with boosting (SMOTEWB) method, which makes use of this information to determine the appropriate number of neighbors for each observation within SMOTE algorithm.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117023