| Cyberattacks have become more and more common,and cross-site scripting attacks are the most commonly used method of cyberattacks.Nowadays,cross-site scripting attacks are updated so quickly.The current cross-site scripting attack automatic identification and interception prevention scheme often uses rule matching methods or approaches similar to it,or machine learning and deep learning to automatically identify cross-site scripting attacks.But all these method will not fully identify cross-site scripting attacks,resulting in omissions,or bypass the rules after mastering the existing rules,which makes it difficult to update the scheme and low versatility,so that cross-site scripting attacks are mistaken for normal access.The main work of this thesis is as follows:Firstly,aiming at the characteristics of cross-site scripting attacks with strong concealment and easy confusion,this thesis adopts new word segmentation methods,word embedding methods and vectorization methods for data,so that the main characteristics of data are retained without redundancy.The cross-site scripting attack recognition network model based on multi-layer neural network is mainly used for the identification of cross-site scripting attacks,and the cross-site scripting attack automatic identification network with CNBLA and CNBGA is proposed,and the parameter selection is explained.Convolutional neural network has a good connection and extraction of partially features of data,while recurrent neural network Bi GRU and Bi LSTM extract various features of data at the time level and associate before and after features,and dynamically combine and analyze the features before and after,at the same time,the combination of the two avoids the shortcomings of the network can not be improved or the fluctuation is large,and at the same time the attention mechanism is added,and the attention mechanism amplify the useful features,thereby improving the accuracy in the experiment.Secondly,according to the different types of cross-site scripting attacks and the underlying logic of cross-site scripting attacks,this thesis proposes a new type of automatic identification and interception for cross-site scripting attacks based on deep learning,and this thesis divides the access into the user request stage and the browser server return request stage respectively,so as to obtain CNBLA network based on analyzing in different attack periods and CNBGA network based on analyzing in different attack periods.The generalization ability of the model is improved,and the accuracy is also improved.Finally,based on the understanding and reproduction of cross-site scripting attack cases,empirical feature extraction is used,33 features are summarized according to all the cross-site scripting attack data collected,and all features are divided into two different subsets,that is,different categories of feature subsets for controlled experiments,one is URl composition content features,the other is Web response content features,and then the random forest algorithm is used to score features and build feature matching modules.Therefore,deep learning network based on feature matching module and analyzing in different attack periods is obtained to automatically identify cross-site scripting attack,named CNBLA+feature-matching-module network based on analyzing in different attack periods and CNBGA +feature-matching-module network based on analyzing in different attack periods.which improve the effect of cross-site scripting attack detection.,and accurancy of cross-site scripting attack automatic detection. |