Font Size: a A A

Research On The Method Of User Profiling Construction Based On Search Engine

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y K LiFull Text:PDF
GTID:2429330545462916Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Search engine is one of the most valuable Internet applications.For the website builder,search engine platform is not only convenient for website users,but also an effective tool to study the behavior of website users.For enterprises,how to use search engine platform for marketing,improve marketing conversion rate,increase customer loyalty and customer stickiness,is the key to the survival of enterprises.User profiling technology can help the enterprise pinpoint the user group and adjust the marketing strategy according to the feedback information.But search engine has its own particularity,users can search without login,so it is difficult to obtain the basic attributes of users.The data mining technology and machine learning are used to analyze the user search data that can be collected,so the basic properties of the user can be predicted and the user profiling based on the search engine can be constructed,which is beneficial to the customer segmentation of the search platform,accurate positioning of consumer groups,saving platform operating costs and so on.The main work of this paper is as follows:(1)Pretreatment of search engine user data with poor quality.In word segmentation,“jieba” participle with better effect is selected.In the process of word segmentation,some of the selected parts of speech are preserved.The text information feature representation selects vector space model,which is based on TF-IDF(Term frequency-Inverse document frequency)and has a good performance in both academia and industry.(2)For the sparse high dimensional feature vector,the safe feature screening method is used to screen the feature words which have no effect on the feature vector.The feature dimension is reduced and the efficiency is improved without reducing the accuracy of the feature vector.(3)The word distributed representation,which contains both word information and context semantic relation,is combined with the vector feature represented by the spatial vector model filtered by the feature,which is used as the feature representation of the short text of the search engine.It makes up for the shortcomings of vector space model which can not represent the contextual semantics and syntactic information of text features.(4)Using the two-layer stacking model with good flexibility and performance to construct the user profiling of search engine,the suitable classifier is selected under the condition that the classification speed and precision are guaranteed.The experimental results show that the stacking model can predict the basic attributes of the user better.The conclusions of this paper are as follows:(1)By using the method of security feature filtering,some inactive features can be deleted and the efficiency of text classification can be improved;(2)The accuracy of classification can be improved by introducing word distributed representation as supplementary semantic information;(3)Using stacking model to predict the basic attributes of search engine users has a good effect,and the experimental results show that the model still has good classification accuracy in the case of less training data.With the increase of training data,the classification accuracy is also increasing,so the model has stability.
Keywords/Search Tags:Search engine, User profiling, Word distributed representation, Stacking model
PDF Full Text Request
Related items