Font Size: a A A

Research On Ensemble Learning AdaBoost And Classifiers Fusion

Posted on:2023-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2568306818495384Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of society into the digital age,data mining has become an important means of study big data,and ensemble learning plays an important role in data mining.In complex cases,it was very difficult to construct a single classifier,and the output may be unstable,and ensemble learning can effectively overcome this shortcoming.AdaBoost(Adaptive Boosting)is a representative algorithm in ensemble learning.It only requires the base classifiers to be slightly better than random guess.This property can well solve the problem of classifier construction in complex situations.Therefore,the AdaBoost algorithm has received great attention in the industry.In view of the problems of AdaBoost algorithm,such as noise sensitivity,low combination efficiency of base classifiers and excessive attention to difficult samples,this paper conducts in-depth research on diversity,combination coefficient,margin theory and clustering,The specific works was as followed:In order to improve the integration efficiency of the AdaBoost algorithm,according to the distribution state of the samples weight and the classification error rate of the base classifiers,a new method for calculating the coefficient of the base classifiers was given.This method overcomes the disadvantage that the base classifiers coefficient of the traditional AdaBoost algorithm was only related to the error rate,and the structure of the traditional AdaBoost algorithm was not changed.The improved algorithm still satisfies the error convergence upper bound of the traditional AdaBoost algorithm,which better reflects the classifications effect of the base classifiers.Secondly,in order to improve the diversity between base classifiers,a double-fault measure was introduced in the process of base classifiers selection to prevent the homogeneity of the classifiers in the iterative process.Based on the above two points,WD AdaBoost(AdaBoost based on weight and double-fault measure)algorithm is proposed.The experimental results show that the new algorithm can further improve the classifier performance.In order to solve the negative shift problem of the sample weights in the later iteration of the AdaBoost algorithm,WPIAda(Sample weight and parameterization of improved AdaBoost)and WPIAda.M(Sample weight and parameterization of improved AdaBoost-multitude)based on margin theory two improved algorithms are proposed.Firstly,both WPIAda and WPIAda.M divide the update of sample weights into four cases,increasing the sample weights whose margin changes from positive to negative to suppress the negative movement of the margin and reduce the number of samples whose margin was at zero.Among them,the samples weight adjustment range of the WPIAda.M algorithm was smaller,in order to ensure the accuracy of the interval movement,the calculation formula of the base classifiers coefficient adopts the one proposed in 1).The experimental results show that,compared with other algorithms,the test errors of the WPIAda and WPIAda.M algorithms are reduced to different degrees,and the AUC is also improved to different degrees.To improve the diversity and consistency of training samples.Firstly,the training sets was divided into multiple clusters by the clustering algorithm,and then a strong classifiers was trained on each cluster,the base classifiers coefficient of the algorithm learning on each class clusters is the same as that in 1),which ensures that the strong classifier obtained from the locality has a high accuracy.At this time,the weight of the classifier consists of two parts: 1.the similarity between the test sample and each cluster;2.the classification confidence of the test sample by the strong classifiers.Finally,the strong classifiers trained on each cluster are combined through a weighted voting strategy,therefore,AECC(Adaptive ensemble algorithm based on clustering and new base classifier coefficients)algorithm was proposed.Validated on the UCI datasets,the experimental results show that the improved algorithm has better classification accuracy.
Keywords/Search Tags:data mining, ensemble learning, diversity, margin theory, clustering
PDF Full Text Request
Related items