| In the era of rapid development of the Internet,online courses are gradually known to people because of their advantages of high convenience and strong teachers.With the booming development of online courses,in order to increase the purchasing rate of users,many online education enterprises are studying and predicting the purchasing behavior of users.However,the number of users who end up buying the course is far lower than the number who don’t.The numbers of the two user groups are extremely unbalanced,and it is often difficult to predict the purchasing behavior of users.Therefore,solving the problem of data imbalance and improving the traditional forecasting process are of great significance for enterprises to accurately predict the purchasing behaviors of online course users,develop more users and obtain long-term benefits.This study takes the behavioral data of 135,617 experience class users in an online education enterprise during class as the research object.It includes 52 features such as user basic information,user login,user access,and user purchase.Based on this,the purchasing behavior of users is studied and the purchasing tendency of users is predicted.In this paper,some important features of the data set are described and counted first,and data preprocessing and feature selection are performed on the data set,and then the prediction model is established.During the model construction,six machine learning models including support vector machine,random forest,decision tree,Ada Boost,GBDT and Light GBM were initially selected to model the data set,and F1-score and AUC were selected as the evaluation indexes of the model.Through the comparison of the six models,it can be seen that Light GBM algorithm F1-score and AUC are the highest,which shows that Light GBM algorithm has strong advantages in processing unbalanced data.Then,in this paper,SMOTE oversampling,random undersampling and SMOTEENN mixed sampling were used to deal with the unbalanced data,and it was found that mixed sampling was the best.After mixed sampling treatment,F1-score and AUC of several models are greatly improved,reaching about 0.9.Finally,the three models with good forecasting effect are integrated with the Stacking integration learning method.It is found that the fusion model with Stacking integration has the best forecasting effect,with F1-score and AUC reaching 0.98 and 0.99,respectively.According to the research in this paper,the following conclusions can be drawn:First,Light GBM algorithm has strong advantages in the processing of unbalanced data when applied to the prediction of online course purchase behavior.Secondly,after mixed sampling treatment,the prediction effect of several models is greatly improved.Third,the fusion model after Stacking integration has better forecasting effect than other models.Fourthly,users’ login and access behaviors during the experience course have a great impact on users’ purchase intention. |