Font Size: a A A

Online Course Marketing Prediction Based On Ensemble Learning

Posted on:2024-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:J J FanFull Text:PDF
GTID:2557307076492084Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the continuous progress of science and technology,online education has become a major trend in the development of education industry.According to statistics,in recent years,the market size of China’s online education industry has increased from 251.76 billion yuan in2018 to 543.35 billion yuan in 2022,with a cumulative increase of about 116% in four years.The industry has broad development prospects.As one of the important marketing methods in online education industry,telemarketing has a long history of development,with the characteristics of two-way communication and rapid feedback,but the traditional telemarketing has high requirements on labor and time costs,and low marketing efficiency.Therefore,using machine learning technology to analyze user data and predict the possibility of users buying online course products is of great significance to improve the efficiency of telemarketing work of online education companies.This paper mainly takes part of the user data of a domestic online education company in July 2022 as the research object.This data set contains the basic information of user registration and a series of characteristics derived from user usage records.The processing and analysis of the data set mainly include descriptive analysis of the data set,data cleaning,model construction and effect evaluation,customer rating and group marketing.In this paper,Light GBM,XGBoost,Ada Boost and Random forest algorithms were selected for modeling on the data set,and Accuracy,Precision,Recall,F1 score and AUC were used as evaluation indexes to evaluate the prediction effects of each model.Taking into account that the data used in this paper is unbalanced with positive and negative sample ratio of close to 1:9,and then using the six sampling methods of SMOTE,Borderline SMOTE,ENN,Tomek,SMOTE ENN and SMOTE Tomek,in combination with the above four algorithms,The effect changes of each model before and after using sampling method are compared.The results show that Recall,F1 value and AUC value of each model are improved to varying degrees after sampling method is used,indicating that the model combined with sampling method has better prediction effect.Based on the improved Stacking effect brought by the use of sampling methods,this paper uses four models as learning tools,combines the sampling methods with the best corresponding effects of each model,uses logistic regression as a meta-learning tool,and constructs an improved stacking model by weighted fusion through 50% cross validation and mesh parameter optimization.Compared with the traditional Stacking model,the improved model takes into account the diversity of base learners and sampling methods,improving the overall forecasting effect and generalization ability of the model.The results show that the Recall,F1,and AUC values of the improved Stacking model are significantly increased,which has obvious advantages over the four base learning tools and the traditional Stacking model.In order to study the influence of various features on whether users will buy course products,Light GBM model with the best effect was selected from the four models constructed above.Focal Loss function,which is more suitable for handling unbalanced data,was used to modify its default logarithmic loss function.The results show that the improved Light GBM model has better effect.Based on the importance of features of Light GBM model,the influence of each feature on whether users will buy is analyzed,and some key indicators are briefly analyzed.Finally,an improved Stacking model was used to predict the probabilities of users purchasing online course products,convert the probabilities into user ratings,and group all users into groups of 50.The results show that the purchase rate of each group is significantly different after grouping.The probability of the group with the highest score buying products is about 97.98%,while that of the group with the lowest score is only 0.84%.The efficiency of telemarketing will be greatly improved if users are grouped according to the model and scoring method constructed in this paper.
Keywords/Search Tags:Online education, Telemarketing, LightGBM, Stacking
PDF Full Text Request
Related items