Font Size: a A A

Churn Prediction Models For Anonymous Telecom Customer Dataset

Posted on:2018-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:R Q LiFull Text:PDF
GTID:2359330515497250Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Churn prediction is the core of telecom customer relationship management.Models derived from data mining technology enables to predict the potential churn customers with high probabilities.This technique is able to assist operators to design appropriate marketing strategies and provide data for scientific decision-making.It is concluded from large numbers of papers that telecom customer churn prediction is studied as the binary classification problem generally.Recently,the research field is faced with several key scientific problems as follows.First,the unbalanced distribution of positive and negative examples restrains the performance of some classical algorithms.Second,the privacy protection of commercial big data increases the difficulty for researchers to understand the authentic meaning of churn dataset.Third,traditional feature engineering cannot structure sufficient features,which limits the performance of algorithms as well.In order to deal with unbalanced distribution problem,an unbalanced ensemble classifier is proposed based on the combination of sampling technique and ensemble learning.The method constructs small subsets with balanced distribution of positive and negative examples by sampling with replacement.Logistic regression classifiers are trained based on subsets,and the average probability of all classifiers is calculated by voting mechanism,then a homogeneous ensemble classifier is obtained as the final output.To Cope with data comprehension problem caused by anonymous features,this thesis combines data discretization technique with one-hot encoding and presents a method for structuring high-dimensional features based on deep learning.The method structures large quantities of redundant features through the hierarchical network architecture to make up for the shorts of domain knowledge or expert experience.In addition,it is well known that decision trees work effectively on unbalanced binary classification problem,so this thesis employs gradient boost decision trees to construct telecom customer churn prediction model.Furthermore,a method that extracts low-dimensional features based on gradient boost decision trees is proposed.The method combines ensemble learning with statistical theory,improves the prediction performance of models and reduces the complexity of computation simultaneously.Experimental results demonstrate that it is effectively to improve the prediction performance of models by proposed algorithms.However,due to the insufficient capacity of the dataset,the performance of some proposed algorithms is limited.There still exists opportunities and challenges for further studies.
Keywords/Search Tags:churn prediction, unbalanced binary classification, ensemble classifiers, gradient boost decision trees, deep learning
PDF Full Text Request
Related items