| With the rise of Internet technology,Web applications have penetrated into all aspects of our lives.There are also many Web vulnerabilities that follow.Among them,cross-site scripting(XSS)is a kind of computer security vulnerability that often occurs in this application.It steals malicious scripts and links to normal web pages.User's sensitive data.How to detect these cross-site scripting attacks is a hot topic in Web security research.Traditional methods often need to spend a lot of time and effort to extract the charac-teristics of these attack data,but also need a certain amount of experience in order to achieve good results.In recent years,researchers have proposed ways to use machine learning to detect,but in the current era of big data,the amount of data is getting larger and larger.Using this shallow layer of learning can no longer meet the requirements of our detection effect.So this paper proposes a method based on deep learning to detect.First,we use the simplest word bag modelăunsupervised Word2vec model and Glove model to convert the data we have mined into word vectors.These word vectors will replace artificial extraction.Features,this approach saves manpower,and then uses a deep learning model to learn the characteristics of these word vectors to distinguish between attacks and normal data.Final-ly,it is verified by comparison that the deep learning model has achieved very good results in cross-site scripting attack detection.In this article,the collected web page data contains more than 200,000 positive and negative examples of URL,Javascript script,and HTML tag.Then,three different word vectors are generated by using the word bag model,Word2vec model,and Glove model to input deep learning models LSTM,CNN,and CNN+LSTM for training learning.The purpose is to divide our data set into XSS and non-XSS.Through several experiments,the appropriate hyperparameters of the CNN+LSTM model were selected.Finally,the recall rate and accuracy of these three models were calculated based on the experimental results.The results show that the CNN+LSTM model proposed in this paper is not only better than the LSTM and CNN models commonly used in deep learning,but also the word vector obtained by Word2vec model is better than the other two models,At the same time,it can achieve better performance than general shallow machine learning,recording the highest accuracy of 99.870%and lowest false positive rate of 0.08%. |