| In recent years,software defect prediction has become popular in the field of software quality assurance and software maintenance.Because within-project defect prediction is highly dependent on data sets,it is difficult to train an effective prediction model for a new software project.For this case,a feasible solution is to train the prediction model by using the historical data of other projects,which is the cross-project software defect prediction.Many previous studies consider cross-project software defect prediction as a binary classification problem or regression problem.Cross-project software prediction only predicts the defect proneness for a given software entity(such as software classes,modules etc.),and the limitation of this method is relatively large especially when the period of software project development is urgent and human resources are lacking.For software developers and software testers,the software entity's ranking information is particularly important,by which software developers and software testers can objectively improve and repair software entities.However,few studies have been reported in this area.Based on the analysis and summarization of machine learning and statistical methods,this paper makes a systematic study on cross-project software defect prediction.The main research work and contributions of this paper are summarized as follows:(1)We define cross-project software defect prediction as ranking problem.Inspired by the Point-wise method of Learning to Rank(LTR),we propose a ranking oriented cross-project software defect prediction method,which is called ROCPDP.The method ranks the software entities according to the number of defects contained in the software entity.In order to obtain accurate ranking results,we trained a multiple linear regression model using gradient regression optimization.Considering the high dimension of software defect data,which will increase the training cost of the model and may cause over-fitting,the method proposed in this paper will use PCA to select the feature.In order to make the gradient descent process converge quickly,we applied the Z-score to the features(software metrics)before the model training.(2)In order to verify the effectiveness of the ROCPDP method for ranking,this paper has carried out several experiments.A case study of the data sets collected from AEEEM and PROMISE shows that ROCPDP is superior to the other eight benchmarks in one-to-one and many-to-one CPDP scenarios.Besides,in the many-to-one scenarios,ROCPDP is,by and large,comparable to the best baseline method which is performed in a specific within-project defect prediction scenario. |