Font Size: a A A

Incomplete Supervision Of Software Defect Prediction Technology Research

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:W Z ZhangFull Text:PDF
GTID:2518306512987619Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the scale and complexity of software are increasing constantly,and the economic losses and testing costs caused by software defects are very high.Software defect prediction technology can help with the prioritization of testing efforts by predicting defect-prone modules.However,an effective defect prediction model often requires abundant labeled defect-prone instances,which hinders the application of defect prediction.If unlabeled instances or existing labeled other project’s data could be used,the application cost of defect prediction models can be effectively reduced.From the perspective of weakly supervised learning,this paper studies the software defect prediction technology with incomplete supervision.The main contents are as follows:Firstly,effort-aware defect prediction based on semi-supervised learning.Just-in-time(JIT)defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time.Since the label of change is hard to acquire,it would be more desirable for applications if a prediction model doesn’t highly rely on the label information.However,the performance of the unsupervised models proposed by previous work isn’t good in classification scenarios due to the lack of supervised information.This paper studies the JIT defect prediction from a semi-supervised perspective,and proposes an effort-aware tri-training model based on sample selection.Experimental results on six open source projects show that the performance of effort-aware tri-training method is better than the existing JIT defect prediction methods.Secondly,cross-project defect prediction with heterogeneous metrics.Heterogeneous defect prediction can perform prediction without requiring the source and target project to have the same set of metrics and has attracted great interest.Existing heterogeneous defect prediction models use naive or traditional machine learning methods to learn feature representations between source and target projects,and perform prediction based on it.The feature representation learned by previous studies are weak,causing poor performance in predicting defect-prone instances.In view of the powerful feature extraction and representation capabilities of deep neural networks,this paper proposes a feature representation method for heterogeneous defect prediction based on variational autoencoders.By combining the variational autoencoder and maximum mean discrepancy,this method can effectively learn the common feature representation of the source and target projects.Then an effective defect prediction model can be trained based on it.The validity of proposed method is verified by comparing with traditional cross-project defect prediction methods and heterogeneous defect prediction methods on various datasets.Finally,a cross-project software defect prediction system based on variational autoencoder.In order to identify the quality of the learned feature representation,this paper implements a universal cross-project software defect prediction system based on the proposed cross-project software defect prediction method.This system can not only simultaneously handle crossproject defect prediction problems with homogenous and heterogeneous metrics,but also show the change of data distribution when two projects are training.The displayed feature distribution can further verify the effectiveness of the proposed method.
Keywords/Search Tags:defect prediction, weakly supervised, effort-aware, heterogeneous metrics
PDF Full Text Request
Related items