Loading…

Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain

•Novel methods that uses deep neural networks, cost-sensitive and ensemble learning.•The methods are compared with 12 methods on 6 large real-life imbalance data sets.•They outperforms existing methods in generalization performance.•They obtain low generalization gaps and can avoid overfitting.•CSDE...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2020-03, Vol.141, p.112918, Article 112918
Main Authors: Wong, Man Leung, Seng, Kruy, Wong, Pak Kan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Novel methods that uses deep neural networks, cost-sensitive and ensemble learning.•The methods are compared with 12 methods on 6 large real-life imbalance data sets.•They outperforms existing methods in generalization performance.•They obtain low generalization gaps and can avoid overfitting.•CSDE gives excellent results on data sets across a wide range of imbalance ratios. Standard classification algorithms assume the class distribution of data to be roughly balanced. Class imbalance problem usually occurs in real-life applications, such as direct marketing, fraud detection and churn prediction. Class imbalance problem is referred to the issue that the number of examples belonging to a class is significantly higher than those of the others. When training a standard classifier with class imbalance data, the classifier is usually biased toward the majority class. In this work, we propose two novel cost-sensitive methods to address class imbalance problem, namely Cost-Sensitive Deep Neural Network (CSDNN) and Cost-Sensitive Deep Neural Network Ensemble (CSDE). CSDNN is a cost-sensitive version of Stacked Denoising Autoencoders. CSDE is an ensemble learning version of CSDNN. Random undersampling and layer-wise feature extraction from the hidden layers of the deep neural network are applied in CSDE to improve the generalization performance over CSDNN. In some literatures, various methods handling class imbalance problem were proposed. However, the experiments discussed in those studies were usually conducted on relatively small data sets and also on artificial data. The performance of those methods on modern real-life data sets, which are more complicated, is unclear. In our experiment, we examine the performance of our proposed methods and the other methods using six large real-life data sets in different business domains ranging from direct marketing, churn prediction, default payment to firm fraud detection. The results show that the proposed methods obtain promising results in handling class imbalance problem and also outperform all the other compared methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.112918