Font Size: a A A

A Two-stage Feature Selection Method And An Ensemble Classifier Based Telecom Customer Churn Prediction Model

Posted on:2017-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z W XuFull Text:PDF
GTID:2309330485451820Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of information and communication technology, the telecom market is becoming increasingly saturated, and the competition among telecom operators is becoming fiercer. Customer churn prediction has become a topic that telecom operators focus on. Predicting the potential churn customers that have high churn probabilities and developing appropriate customer retention strategies are of great importance for telecom operators. In the research background of telecom customer churn prediction, an efficient and accurate churn prediction model is very important for telecom operators to predict the potential churn customers that have high churn probabilities. The main contents and achievements of this dissertation are as follows:1. To solve the high dimensionality problem in the telecom customer churn dataset, this dissertation studies the impacts of different optimal feature subsets selected by principal component analysis, chi-square, and fisher’s ratio on the prediction results of naive bayes, linear support vector machine, logistic regression, decision tree and random forest by using the big data processing framework Spark and its machine learning library ML/MLlib. The experimental results show that different optimal feature subsets selected by different feature selection methods have different impacts on different classifiers. Fisher’s ratio can select relatively optimal feature subsets and achieve better prediction result.2. A two-stage feature selection method based on fisher’s ratio and the prediction risk criterion is proposed to select important features. To solve the existing problems of the feature selection methods in the field of telecom customer chum prediction, this dissertation proposes a two-stage feature selection method based on the fisher’s ratio and the prediction risk criterion, taking advantages of the filter feature selection method and the wrapper feature selection method. The feature subset selected by the proposed method not only has stronger discriminating ability, but also has greater impacts on the forecasting performance of classifiers. The experimental results show that, compared to the results obtained by no feature selection or one-stage feature selection based on fisher’s ratio, the proposed two-stage feature selection method can improve the forecasting performance.3. A telecom customer churn prediction model is proposed based on a two-stage feature selection method and an ensemble classifier to further improve the forecasting performance. Firstly, five churn prediction models based on naive bayes, linear support vector machine, logistic regression, decision tree and random forest are constructed with the Spark machine learning library. On this basis, an ensemble classifier is built up using the classifiers that show better forecasting performance using the optimization combination forecasting method. The prediction probability of the ensemble classifier is decided by weighted summing the prediction probabilities of classifiers used to construct the ensemble classifier. The experimental results show that the proposed churn prediction model based on the two-stage feature selection method and the ensemble classifier can improve the forecasting performance compared to the single classifier.Combined with the big data processing framework Spark, this dissertation conducts a comprehensive study of the impacts of different feature selection methods on different classification models. To solve the existing problems of the feature selection methods in the field of telecom customer churn prediction, a two-stage feature selection method based on fisher’s ratio and prediction risk is proposed, combined with the advantages of the filter and wrapper feature selection methods. On this basis, a telecom customer chum prediction model based on the two-stage feature selection method and an ensemble classifier is proposed to further improve the forecasting performance. The experimental results show that the proposed two-stage feature selection method based on fisher’s ratio and prediction risk criterion can improve the forecasting performance of each classification model. Besides, the telecom chum prediction model based on the two-stage feature selection method and the ensemble classifier can achieve better forecasting performance.
Keywords/Search Tags:customer churn prediction, big data, Spark, class imbalance, two-stage feature selection, an ensemble classifier
PDF Full Text Request
Related items