Font Size: a A A

Research On Feature Selection Method For Software Defect Prediction

Posted on:2019-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LinFull Text:PDF
GTID:2417330545489975Subject:Statistics
Abstract/Summary:PDF Full Text Request
In today's information society,with the rapid development of information technology,software products have penetrated into all aspects of people's lives,the scale of software has become larger,the complexity of software has become higher and higher,and the requirements for software reliability have also increased.In the process of software development,software defects are unavoidable.The labor cost of software testing is very high,and the software development cycle is tight.Therefore,how to accurately find the modules with fault-prone in the software in an effective time becomes a guarantee of software development progress and the key to improving software reliability.The main purpose of Software Defect Prediction(SDP)is to predict whether a module in a software product is fault-prone.It predicts whether a new module is a fault-prone based on the software metrics and historical defect information.The accurate prediction of software defects is conducive to the rational allocation of limited resources and the timely repair of software defects,which can save software development costs and improve the quality of software.Currently,SDP has two major challenges:(1)there are irrelevant features and redundant features in the software's metric information.SDP builds a prediction model by mining historical data sets.It does not mean that all the features of these data are conducive to the prediction of software defects.The irrelevant features and redundant features not only affect the running speed of the prediction algorithm,but also may reduce the prediction accuracy.(2)class imbalance problem.SDP is a two-class problem,software modules can be divided into fault-prone and non-fault-prone.Among them,the modules with the fault-prone tend to be only a small part of the software system.However,this small part of the module is precisely what we are concerned about.If the modules with the fault-prone are mistaken for those with the non-fault-prone,it can cause serious consequences such as system failures.In order to solve these two major problems of SDP,this paper focuses on the in-depth study of feature selection of imbalanced data.The main contributions are as follows:1.A multivariate filter feature selection algorithm based on data sampling is proposed.Firstly,the sampling method SMOTE(Synthetic Minority Oversampling Technique)and ENN(Edited Nearest Neighbor)re-samples the data set to achieve the data balance.Secondly,the multivariate filter algorithm CFS(Correlation-based Feature Selection)and FCBF(Fast Correlation-Based Filter)selects feature and eliminates useless features such as irrelevant features and redundant features.The proposed algorithm is simulated on NASA test data sets.The experimental results show that the multivariate filter feature selection algorithm based on data sampling can effectively improve the prediction performance of the SDP,thus improving the quality and reliability of the software.2.A cost-sensitive hybrid feature selection algorithm is proposed.Firstly,the filter algorithm introduces cost-sensitive information CSVS(Cost-Sensitive Variance Score),CSLS(Cost-Sensitive Laplacian Score)and CSCS(Cost-Sensitive Constraint Score)to solve the class imbalance problem.Secondly,the complementarity between the filter algorithm and the wrapper algorithm is used,and the performance of the previous subset is improved by the SFFS(Sequential Forward Floating Selection)algorithm.Experimental results on NASA test data sets show that the cost-sensitive hybrid feature selection algorithm not only can effectively improve the prediction accuracy of SDP for the minority classes(the modules with the fault-prone),but also effectively improve the overall classification performance of SDP.
Keywords/Search Tags:software defect prediction, multivariate filter algorithm, data sampling, cost-sensitive learning, the SFFS algorithm
PDF Full Text Request
Related items