Loading…
Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain
•Novel methods that uses deep neural networks, cost-sensitive and ensemble learning.•The methods are compared with 12 methods on 6 large real-life imbalance data sets.•They outperforms existing methods in generalization performance.•They obtain low generalization gaps and can avoid overfitting.•CSDE...
Saved in:
Published in: | Expert systems with applications 2020-03, Vol.141, p.112918, Article 112918 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Novel methods that uses deep neural networks, cost-sensitive and ensemble learning.•The methods are compared with 12 methods on 6 large real-life imbalance data sets.•They outperforms existing methods in generalization performance.•They obtain low generalization gaps and can avoid overfitting.•CSDE gives excellent results on data sets across a wide range of imbalance ratios.
Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2019.112918 |