| With the rapid development and maturity of Internet technology and the perfection of software and hardware performance,the services provided by the Internet are increasingly enriched,which generates a large amount of data traffic from the Internet applications every moment.When a user makes a Web request to access these resources,the Domain Name System(DNS)will resolve the domain name to the corresponding IP address.DNS query record contains a series of contents that reflect user behavior,such as access intentions and motivations.Analyzing these data has great commercial value and safety significance.However,in most cases,these data cannot be directly used.Studying the effective representation of the data and establishing a suitable model is the basis for the subsequent user's personalized behavior analysis.Therefore,an user access behavior analysis and study method based on word embedding was proposed.The main contents are as follows:1.The status quo of research based on DNS data is analyzed.First,existing research methods are highly dependent on extended data and artificial and it is difficult to guarantee real-time requirements in high-speed networks.Second,domain names are very short with limited information and lack of effective features for classification.In the analysis of user behavior,it is still challenging to visualize and classify domain names due to the lack of natural order.2.The basic theories and techniques related to word embedding are studied.The word embedding technology has a significant advantage in expressing the context information in complex environments.It provides a new idea for the study of user access behavior by using only domain name data.3.For user's active access behavior,a domain name similarity analysis method based on word embedding technology is proposed.This method can automatically obtain the results of embedded word using Skip-gram model and apply these results to mine the semantic similarity of domain name and potential user preference characteristics.Experiments results indicate that the method can obtain the semantic information of the domain name and the user's access preference.4.For user's passive access behavior,an abnormal domain name detection method based on word embedding technology and Long Short-Term Memory(LSTM)network isproposed.The domain name is vectorized by embedding layer.Combined with the advantage of LSTM to effectively learn long-distance information,the method automatically learning the character relationship before and after a domain name by supervised learning to estimate the spelling characteristics of a normal domain and the pseudo-random characteristics of a botnet's domain.The results show that the vector trained by the embedded layer has the learned data characteristics.Combined with the LSTM network model,it can better adapt to the corresponding network tasks,and make the detection results have higher accuracy,lower false alarm rate and more highlighted advantages. |