| Computer and network have been applied to all aspects of people's life,and network security has been paid more and more attention.In February 2014,China's central leading group on cybersecurity and information technology was established,marking the rise of cybersecurity to a national strategic level.Web back door(Webshell)is one of the main threats to network security.Driven by economic interests and the application of various new technologies,the number of network backdoors is huge.At the same time,various types of network back door emerge in endlessly,which leads to the security threat index rising year by year.Webshell detection can detect the backdoor files in the website to manage the website security.The existing Webshell detection methods are mainly based on feature function extraction,and the detection rate is not high.This paper is devoted to the research on the key technologies of Webshell detection and the existing methods of Webshell detection.At the same time,it draws on and absorbs the relevant research on natural language processing,transforms the executable file of Webshell into word vector form to make it vectorized,and probes into the detection methods of Webshell and its variants based on the file vector The work and contribution can be summarized as follows:1.On the basis of statistical feature extraction and feature function extraction,three feasible methods for Webshell detection are analyzed and studied,which are based on decision tree,extreme gradient lifting tree and BP neural network.The three detection methods are verified by experiments.Generally speaking,these three detection methods based on feature function and machine learning have different disadvantages due to the limitation of feature selection,and their classification performance is not good enough.2.Based on the analysis and in-depth study of the data preprocessing algorithm of Webshell detection,a new word segmentation method is proposed.The basic idea is to segment words by spaces and special characters of non numbers and letters,so as to retain the vector meaning of words,and the vocabulary is also relatively small,which overcomes the disadvantages of large memory consumption of word segmentation through spaces,filling in some words with low frequency through vectors,and losing their specific meanings.3.A detection model based on convolution neural network is proposed,which can realize classification by learning samples.In the model,word2 vec is used to vectorize the samples,and convolution neural network is used to train and detect the samples.The experimental results show that the model has high detection rate and good detection effect.4.A mesh shell detection model based on attention mechanism is proposed.In the model,we mainly focus on in-line word association.We use word2 vec to quantify the samples,GRU and attention mechanism to train and detect the samples.It is verified by experiments that the model has high detection rate and good detection effect.5.Compared with the traditional detection algorithm based on feature extraction,it only aims at one kind of Webshell script language,and the effect is poor when detecting Webshell of multiple languages at the same time. |