| With the popularity of the Internet and mobile devices,mobile Internet advertising has gradually become one of the important ways of marketing for enterprises,and advertisers and delivery platforms are facing the important challenge of how to improve the click-through rate and revenue of advertisements.Therefore,how to accurately predict the click-through rate of advertisements has become a hot research direction.This thesis presents a fusion model-based approach to advertising click-through rate prediction,using a fusion of two machine learning algorithms:e Xtreme Gradient Boosting(abbreviated as XGBoost)and Logistic Regression(abbreviated as LR).The LR algorithm,on the other hand,has the advantages of fast computation,high accuracy and ease of implementation and interpretation.However,the disadvantage of the LR algorithm is its limited expressiveness and its inability to automatically combine features.With the XGBoost+LR fusion model,the features of XGBoost can be used to automatically filter and combine features to generate new feature vectors,and then the processed new feature vectors are input into the LR model for training to obtain the final prediction results.The main work consists of the following two parts:In the first part,exploratory data analysis.Firstly,to avoid the problem of model training timeout due to the large size of the dataset,1 million samples were randomly selected from the 40 million samples in the original dataset as the new dataset;secondly,descriptive statistics were conducted on the characteristic variables in order to understand the data distribution;finally,data pre-processing work such as variable elimination,splitting,binarisation and coding were completed in conjunction with the data visualisation results.In the second part,the models are constructed and selected.First,in the modeling process,the feature-engineered processed datasets are brought into the LR and XGBoost models respectively according to the characteristics of different models,and the prediction effects of the two models before and after feature selection are compared to verify the necessity of feature selection;then,the hyperparameter optimization is implemented by using grid search and combining control variables to further improve the performance of the models;then,the optimized XGBoost is used as the base learner,and the prediction results of the base learner are feature coded and input as a new training set into the meta-learner,i.e.,the LR model,for training to obtain the final prediction results;finally,the AUC values of each model are compared on the test.Finally,the AUC values of each model on the test set are compared to validate the effectiveness of advertising click-through rate prediction based on XGBoost+LR fusion model. |