With the increasing complexity of software systems,software inevitably has some defects,and the existence of software defects will lead to the decline of software quality.This paper studies software defect prediction technology using machine learning method.Aiming at the high dimensionality and complexity of software defect prediction data and serious class imbalance,a software defect prediction technology based on feature dimensionality reduction and cost-sensitive learning is proposed,which is combined with extreme learning machine for defect prediction.The main contents are as follows:Firstly,in view of the high dimensionality and complexity of software defect datasets,a data preprocessing algorithm DLFDR(Double layer feature dimensionality reduction)that integrates feature selection and feature extraction is proposed to reduce feature dimensionality.The algorithm uses the normalized data set,firstly adopts the wrappingbased feature selection method-recursive feature selection,and uses random forest as the underlying iterative model to eliminate redundant features in the data set,and select the optimal feature through cross-validation Subset.Then KPCA(Kernel Principal Component Analysis)technology is used to extract the features in the software defect prediction data set,and the features are further reduced in dimension.Secondly,aiming at the class imbalance problem of software defect prediction data set,a weight calculation method based on sample distance is proposed using the idea of costsensitive learning.The algorithm determines the local weight of the sample according to the distance between the sample and the same sample,and designs a new weight function to calculate the global weight of the sample.While increasing the weight of the minority class sample,it can reduce the influence of outliers and noise samples.Thirdly,this paper proposes a software defect prediction method based on costsensitive learning.The method uses the DLFDR algorithm to process the data set,and combines the sample global weight matrix with the extreme learning machine for sample weighted prediction,so as to build a sample sensitive to the minority class.The software defect prediction model can realize the prediction of software defects.Finally,the algorithm proposed in this paper is experimented with the most commonly used NASA public software defect dataset in the field of software defect prediction.The software defect prediction method in this paper is compared with the baseline method and the methods of other scholars,which verifies the effectiveness of the algorithm proposed in this paper,and analyzes the experimental results. |