Font Size: a A A

Research For Click-through-rate Prediction Based On XGBoost Method

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H ZengFull Text:PDF
GTID:2370330596995416Subject:Control engineering
Abstract/Summary:PDF Full Text Request
For a long time,advertising has become one of the main sources of income for Internet companies.Internet leaders(such as Google,Facebook,Ali,etc.)have used advertising as their core industry,and more and more companies are aware of technology-driven advertising.Delivery is more competitive.The essence of Click-Through-Rate(CTR)estimation research is to maximize the interests of advertisers,advertising platforms and users,that is,advertisers get high click-through rate,advertising platform's revenue can be maximized,users' Satisfaction increases,so it is challenging and important to increase the CTR projections for advertising.At present,the research on CTR estimation tasks carried out by the industry has been relatively mature,but there are still some shortcomings that deserve our deep thinking.First,the most widely used LR model is the primary choice for most companies to make CTR estimates.This model is simple and easy to implement,and the training speed is fast.It can be iteratively completed in the face of billions of data,but this method has limited learning ability.It is not possible to extract the nonlinear relationship between features,and engineers with computational advertising backgrounds are required to make artificial feature combinations.Second,with the passage of time,the company's business is expanding and the amount of data that needs to be processed is increasing.How to use the current model to quickly calculate the CTR value and ensure the stable operation of the advertising module is a problem worthy of attention.(1)For the problem that it is difficult to express the nonlinear relationship between features in a single LR model,this paper adds the eXtreme Gradient Boosting(XGBoost)model to the model,because it has the automatic construction of combined features and the process of building trees in parallel.The advantage of the XGBoost feature can be used as the input of the LR iterative calculation.The XGBoost+LR fusion model can not only effectively mine the hidden relationship between features,but also improve the estimation accuracy.(2)For the obvious change of advertising data volume or business scenariomigration,the computing environment of the model needs to be deployed as a distributed computing platform with better scalability,fault tolerance and high throughput.The main work of the platform is offline training click rate estimation model,the trained model is updated to the line,and the CTR value in the candidate advertisement library is calculated in real time.
Keywords/Search Tags:click-through-rate, combined feature, fusion model, distributed computing
PDF Full Text Request
Related items