| Click-through rate prediction(CTR prediction)is used to predict the probability of users clicking on a commodity or advertisement.It is one of the most important research problems in Computational Advertising.At present,there are many CTR prediction models based on deep learning.Most models improve the prediction accuracy by modeling the interaction between the features of each field,and the embedding vector used to represent the features in the model has an important impact on the effect of the model.The learning process of embedding vectors of different features in the existing models is independent of each other.The long tail distribution of features leads to that most low-frequency features can not learn a better vector representation,which seriously affects the prediction effect of the model.In order to solve the above problem,this paper carries out relevant research,and the main work is as follows:Aiming at the low-frequency feature,the CTR feature embedding representation enhancing technology of similarity in field is studied.The model uses the intra-field feature similarity graph to represent the similarity relationship between features.For sparse features,the graph neural network is used to aggregate the similar feature information.As a preprocessing process,the data of the feature embedding vector is enhanced to improve the representation and learning quality of the embedding vector.Specifically,compared with the existing CTR prediction model,this paper uses graph attention network for the embedding corresponding to sparse features to propagate and update the embedding on the intra-field feature similarity graph,so as to generate a new feature embedding.The new embedding after data enhancing can be directly used as the input of the CTR prediction model.Therefore,the CTR feature embedding representation enhancing technology proposed in this paper can be combined with any CTR prediction model to improve the prediction accuracy.This paper has done a lot of experiments on the public data sets Criteo and Avazu to prove that the embedding representation enhancing technology effectively improves the prediction accuracy of various representative CTR prediction models.The above CTR feature embedding indicates that the enhancing technology relies on the intra-field feature similarity graph to realize the modeling of the similarity relationship between features.Based on the co-occurrence relationship between intra-field features and other field features in the CTR record data,this paper proposes multiple definition of similarity between intra-field features and the construction method of intra-field feature similarity graph.In order to quickly calculate the Top-K features similar to the given features,this paper models the calculation of feature similarity in the field as a process of breadth first traversal on the record-feature bipartite graph,and the proposed algorithm realizes the approximate calculation of feature similarity through the pruning of breadth first traversal.Through experiments,this paper analyzes the impact of different similarity definitions on multiple CTR prediction models,and analyzes the impact of approximate calculation methods on the accuracy and efficiency of intra-field similarity graph construction. |