Loading…

Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning

Defect prediction assists the rational allocation of testing resources by detecting the potentially defective software modules before releasing products. When a project has no historical labeled defect data, cross project defect prediction (CPDP) is an alternative technique for this scenario. CPDP u...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of computer science and technology 2019-09, Vol.34 (5), p.1039-1062
Main Authors:	Xu, Zhou, Pang, Shuai, Zhang, Tao, Luo, Xia-Pu, Liu, Jin, Tang, Yu-Tian, Yu, Xiao, Xue, Lei
Format:	Article
Language:	English
Subjects:	Adaptation Artificial Intelligence Computer Science Data Structures and Information Theory Datasets Defects Indicators Information Systems Applications (incl.Internet) Learning Modules Performance evaluation Regular Paper Software Engineering Theory of Computation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Defect prediction assists the rational allocation of testing resources by detecting the potentially defective software modules before releasing products. When a project has no historical labeled defect data, cross project defect prediction (CPDP) is an alternative technique for this scenario. CPDP utilizes labeled defect data of an external project to construct a classification model to predict the module labels of the current project. Transfer learning based CPDP methods are the current mainstream. In general, such methods aim to minimize the distribution differences between the data of the two projects. However, previous methods mainly focus on the marginal distribution difference but ignore the conditional distribution difference, which will lead to unsatisfactory performance. In this work, we use a novel balanced distribution adaptation (BDA) based transfer learning method to narrow this gap. BDA simultaneously considers the two kinds of distribution differences and adaptively assigns different weights to them. To evaluate the effectiveness of BDA for CPDP performance, we conduct experiments on 18 projects from four datasets using six indicators (i.e., F-measure, g-means, Balance, AUC, EARecall, and EAF-measure). Compared with 12 baseline methods, BDA achieves average improvements of 23.8%, 12.5%, 11.5%, 4.7%, 34.2%, and 33.7% in terms of the six indicators respectively over four datasets.
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-019-1959-z