Font Size: a A A

Prediction Of E-Commerce Customer Churn Based On Machine Learning

Posted on:2024-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiuFull Text:PDF
GTID:2568307106986209Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the continuous growth of Internet users and online shopping users,the ecommerce market has become increasingly large and has broad prospects for development.More and more e-commerce platforms and merchants begin to enter the e-commerce market and compete for precious and limited customer resources.The continuous maturity and growth of the e-commerce industry means that the number of new customers is decreasing,which directly affects the cost and difficulty of customer acquisition.Therefore,how to predict and judge customer turnover tendency in advance according to customer’s historical behavior data and customer related information before customer churn,so that merchants and enterprises can take effective measures in time to reduce customer churn rate;After the churn of customers,to analyze the reasons for customers churn,refine their target customer groups and carry out precise marketing,so as to improve customer satisfaction and corporate benefits.It is of great importance.The innovation of this thesis is based on the research results of customer churn in telecom and bank,and further penetrates into the field of e-commerce,taking the imbalanced customer churn dataset of a foreign e-commerce enterprise as the research object.and useing the single ensemble algorithm and Stacking algorithm in machine learning to establish a customer churn prediction model for the data set respectively.The multi-dimensional model evaluation system is established to compare the model with the best predictive performance for the prediction of e-commerce customer churn,which provides a basis for e-commerce enterprises and businesses to timely judge the customer churn tendency to reduce customer churn and adopt targeted marketing strategies to recover the lost customers.Firstly,exploratory analysis of the data was conducted,including descriptive statistical analysis of the data,visualization of the data through relevant charts,observation of the data from the perspective of data distribution,etc.,and consideration of possible missing values and imbalances in the data set;then the data is pre-processed:for the missing values identified in the data set,they were processed by the method of real value filling;For the numerical processing required by the machine learning method,the categorical variables were numerically processed by one-hot encoding technology,and the numerical variables were normalized in order to eliminate the influence of different dimensions.Finally,the data set is divided into training and test sets,and the imbalance is processed using the SMOTE algorithm to obtain the processed data for the prediction analysis of customer churn.Secondly,the empirical analysis of e-commerce customer churn prediction is carried out,and different ensemble learning methods are used to build models for the processed training set data,the established models are used to predict the test set,and the evaluation index values of each model are calculated.The basic idea of ensemble learning is to form a new learner by integrating many different learners together,so as to improve the generalization and prediction performance of the model.The first step is to use a single model to predict and analyze the customer churn problem.Bagging,Ada Boost,GBDT and XGboost algorithms in ensemble learning are respectively adopted to establish a prediction model for the processed training set data and adjust the important parameters in the model for many times.The accuracy,precision,recall,F1 value based on confusion matrix and AUC value in ROC curve were used to evaluate the model,and the optimal model under the same conditions was obtained.Then,the two-layer Stacking model is used to predict and analyze customer churn.Bagging,GBDT,XGboost,and random forest model are used as the first layer classifiers,and logistic regression and KNN are used as the second layer classifiers,and the two layers of classifiers are combined.Different combination modes are used to establish the Stacking model,and the optimal combination of classifiers is selected as: The first layer classifiers are Bagging,GBDT,and XGboost,and the second layer classifier is logistic regression.After optimizing the Stacking model under the combination of the classifiers and modeling for many times,the average values of the evaluation indicators of the multiple models are obtained.Finally,the research conclusion is obtained: Among the single model,the model established by XGboost algorithm has the best prediction performance for e-commerce customer churn;The evaluation indicators of the stacking model are close to those of the optimal model XGboost in the single model,indicating that the forecasting performance of the Stacking model is also good.
Keywords/Search Tags:Customer Churn, XGboost Algorithm, Stacking Algorithm, Model Evaluation
PDF Full Text Request
Related items