| Ensemble learning is a hot research topic in the field of pattern recognition.This thesis proposes a method of selective ensemble classifier based on BQIGSA,and applies it to the prediction of user churn,hoping to improve the prediction effect and help reduce the user churn rate of operators.In terms of training basic classifiers,this thesis selected CART,Logistic and SVM as single classifier.According to the characteristics of high dimension of user lost data,the base classifiers are trained by constructing different attribute subsets.The selection of attribute subsets has two steps:(1)use Relief,MIC,conditional gini gain,pearson correlation coefficient and Fisher score to calculate the correlation between category and each attribute and then combine the above results with Schwartz Sequential Dropping(SSD)to obtain the final correlation between attributes and category.After that,select the top 80% attributes that are most relevant to the category;(2)use Affinity Propagation(AP)to cluster the reserved attributes and randomly select attributes from each cluster category to construct attribute subsets.The training of basic classifiers formed seven sets of basic classifiers with CART,Logistic,SVM,CL(CART and Logistic),CS(CART and SVM),LS(Logistic and SVM)and CLS(CART,Logistic and SVM)as single classifiers,respectively.In terms of the selection of base classifiers,Firstly,keep the one with the highest AUC in the base classifiers whose prediction results of training samples are completely the same.Secondly,apply the Binary Quantum-Inspired Gravitational Search Algorithm(BQIGSA)to Filter the reserved base classifiers.The fitness was set as G_mean and AUC,respectively.The predictive effect in the test set of the above method was compared with that of the methods for integrating classifiers based on genetic algorithm and bagging algorithm,respectively.The results show that selective ensemble classifier based on the BQIGSA has the best classification effect,and the method can reduce the impact of unbalanced samples on classification results by setting a reasonable fitness function.At the same time,the BQIGSA with AUC as fitness function is more in line with the actual business requirements than that with G_mean as the fitness function.When the recall rate of non-lost users is above 60%,the recall rate of lost users can reach 85.46%. |