Font Size: a A A

Customer Churn Prediction Based On Data Mining

Posted on:2019-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YouFull Text:PDF
GTID:2417330548996183Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Churning prediction has always been a primary part of customer relationship management and plays an important role in ensuring the quantity of customers and enhancing enterprise's competence.With the rapid development of the Internet and information technology,it is able to storage and analysis the massive data from users,and also provide a new way of thinking how to promote the efficiency of user relationship management.More and more operation strategies need to use data mining technology to discover the behavior habits of customers and identify the probability of churn of users.The difficulty of customer churn prediction is that the user data is complex,low-structured and unbalanced.Therefore,to build a suitable forecasting model needs to extract the index information from the mass data and select the appropriate classification algorithm.This study was based on a large number of real data,data cleaning and feature extraction were completed with data visualization method.Finally,13 variables were constructed from the original data for churn prediction.The imbalance classification problem were solved by using undersampling and cost-sensitive learning respectively,three prediction models were constructed:undersampled-Logistic regression model,the undersampled-Random Forest model and the AdaCost model,they all got a pretty good classification result on the training set.Combined with the three loss prediction models,the long renewal "interval" and the recent cancellation of transactions all indicate the user's high turnover tendency.The historical transaction information of users,such as the transaction frequency and cumulative cancellation ratio have little affection on probability of churn,which provides some certain reference value for the user management of paid subscription sites.
Keywords/Search Tags:Customer churn analysis, Imbalanced data, Logistic regression, Random Forest, AdaCost
PDF Full Text Request
Related items