Font Size: a A A

Study On Software Bug Predication Algorithms

Posted on:2018-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:D S A L A H U D D I N HuFull Text:PDF
GTID:2348330536481651Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A renowned topic of interest in software engineering research is software bug prediction.Software bug prediction is to predict the software that there is bug in software or not by analyzing the software metric with machine learning.It can be utilized to help software developers to enhance the quality of software.Software bug prediction is almost certainly a binary classification matter and it relies on the software metrics data and the classifier it used.There were a lot of researches which struggle to utilize the various classifiers and data preprocessing techniques in software bug prediction to enhance the accuracy.However,the problems “How about the effectiveness of classical classifier?” and “which techniques for data preprocessing can enhance the functionality of the software defect prediction?” have not answered clearly.Therefore,it is necessary to conduct an empirical analysis to compare these studies.In software bug prediction,the class of interest is defective modules,yet defective modules are considerably less than non-defective modules in past datum.This causes the class-imbalance problem,which obstructing the accuracy of prediction.In this manner,it is fundamental for bug prediction to overcome the class-imbalance issue.In this thesis,we proposed an experimental construction model,and utilized the software defect dataset from NASA MDP data sets to conduct an empirical analysis `on software bug prediction.Five research questions were defined and experimentally analyzed.Firstly,the effectiveness of typical classifiers,such as bayes net,na?ve bayes,function logistics,simple logistics,Functions SMO,IBK,ada Bost M1,Bagging,classification via regression,decision table,J48,random forest and random trees are compared.Then we use data preprocessing techniques such as propositionalization using decision tree,feature selection,principal component analysis,for the better results.After applying data preprocessing technique we apply SMOTE to solve the class imbalance problem,and the performance of SMOTE with different settings has been analyzed.Experiment results showed that if we don’t apply any preprocessing technique with classifier then J48,IBk,Bayes net and random trees are superior to others and classification regression is worst.Experimental results have also shown that contrasting data preprocessing techniques particularly the propositionalization,the efficiency is better than the other two performance techniques.To overcome the problem of class imbalance for software bug prediction with improve the accuracy of prediction.Because the class imbalance problem affects the accuracy of software bug prediction,for that kind of problem to solve we analyzed SMOTE which is kind of over-sampling algorithm.We utilized SMOTE to do experimental analysis to check how much SMOTE can enhance the accuracy depending on the number of neighbors and also the percentage of minority class which will be added.The experimental results show that,the TP rate and AUC index are getting larger as the percentage of minority class samples increase and also the value of nearest neighbors is more than 1.The TP rate and AUC index inconsideration to the value of neighbor’s equivalent to 2 and 3 respectively are the highest ones.Further analysis,this research also studies and improved SMOTE method.We proposed the technique called ASMO to overcome the shortcomings of the SMOTE method.The first shortcoming,the same number of synthetic data samples are produced for each of the original few samples,without considering the distribution characteristics of the neighboring samples,which increases the likelihood of repetition between classes,resulting in excessive Generalization problem.Second Shortcoming,the sample is pre-determined,lacking the flexibility of rebalancing.The ASMO algorithm examines to solve the problem of over-generalization.Enhance the flexibility of rebalancing with multiple tests.The experimental results show that the TP rate and AUC index of the random forest are more signif icant than those of the other classifiers after the ASMO algorithm is applied by regression analysis,Logistic function,IBk and random forest.
Keywords/Search Tags:Classification, Defect Prediction, Data preprocessing, Defect Model, Classifier, propositionalization, SMOTE
PDF Full Text Request
Related items