Research On Class Imbalanced Data Generation Method For Software Defect Prediction

Posted on:2020-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2428330602461593

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Software defect prediction technology identifies potential defective software modules by analyzing software historical data and using classification and sorting models.In the process of building software defect prediction model,the number of defective samples is much smaller than that of non-defective samples,and the distribution is not uniform.There are serious inter-class and intra-class imbalances,which will have a negative impact on the construction of prediction model.In order to reduce the impact of data imbalance on classifier,there are corresponding methods to correct data imbalance in the four stages of building software defect prediction model,including data sampling,feature extraction,classifier optimization and evaluation criteria.Data sampling is the initial stage of building defect prediction model,and correcting data imbalance in the initial stage can directly reduce the complexity of subsequent stages.Commonly used data sampling methods to deal with class imbalance problem achieve class balance by adjusting the number of samples,but the distribution usually follows the original distribution,and the intra-class balance is not improved.Aiming at the sample distribution,this paper proposes a method to generate unbalanced data of software defect prediction class.According to the distribution in the sample feature space,clustering partition is carried out.Different strategies are adopted to synthesize defective sample data according to different distribution in the partitioned sub-region.By increasing the number,the balance between defective and defective sample classes can be achieved,and the data of different regions can be generated.Different densities improve the intra-class distribution of defective samples.In order to verify the validity of the proposed method,experiments are carried out on nine published defect prediction data sets.The comparison between the proposed method and existing data generation methods is made,and experiments are carried out under different classification algorithms.The results show that the method proposed in this paper can improve the classification performance of classifiers and reduce the impact of data imbalance on software defect prediction results by dividing the samples and adopting different data generation strategies in different distribution areas.。.

Keywords/Search Tags:

software defect prediction, imbalanced data, data sampling, data generation, clustering

PDF Full Text Request

Related items

1	Research Of The Software Defect Prediction Method For Imbalanced Data
2	Research On Software Defect Prediction Model For High Dimensional And Imbalanced Data
3	Research And System Construction Of Data Preprocessing Mechanism In Software Defect Prediction
4	Research On Software Defect Prediction Based On Hybrid Sampling And Integrated Learning
5	Research On User Analysis And Behavior Prediction Driven By Big Data
6	Software Defect Prediction Strategy Design For Imbalanced Data
7	Software Defect Prediction Model Driven By Imbalanced Datasets
8	Research On Imbalanced Data Processing In Software Defect Prediction
9	Research On Software Defect Prediction Technology Based On Data Mining
10	Data Distribution-driven Adaptive Hybrid Sampling Method For Imbalanced Data Processing