| Before building the model,thesis first introduces some data collection and processing work.It mainly includes overall data exploration,missing value description,outlier detection and data analysis.Data analysis mainly includes crowd distribution analysis,user login situation analysis and conversion rate analysis.Through data exploration,we have a basic understanding of the data.The total amount of data is 135,968,including 4,639 users who have purchased behavior and 40 features.Missing value processing Some city names do have a high proportion,and this part of the data is processed accordingly,and outliers are also processed accordingly,including binning.Crowd distribution mainly analyzes city distribution,province distribution,and distribution after city classification;user login status analysis mainly includes the distribution of purchase crowd login days,the distribution of purchase crowd login interval,the distribution of the last login of purchase crowd from the end of the period,and the distribution of purchase crowd login time.For the preprocessed data set,thesis carried out logistic regression,random forest prediction,XGBoost prediction and Light GBM prediction to predict user purchasing behavior.XGBoost leads the way with accuracy of 0.99,precision of 0.86,recall of 0.94,F1 value of 0.9,and AUC value of 0.99.Next,the paper is based on the XGBoost method to optimize,establish the LR_XGBoost model and the BP_XGBoost model,and through random simulation 100 times,it is concluded that the effect of the model after fusion is more accurate than before,and combined with the original features,get added The model effect of the original feature is better than the conclusion of the model without the original feature.After comparing the results of the LR_XGBoost model and the BP_XGBoost model,the effect of LR_XGBoost is 0.001 higher than that of the BP_XGBoost model,so the fusion effect of the complex model may not be better than that of the simple model. |