| With the rapid development of the Internet,the emerging industries of the Internet have risen rapidly,and Web systems are widely used in important businesses such as social and banking.However,Web systems are vulnerable to attacks that seriously affect economic security and social stability.Malicious code attack is one of the main WEB attacks.Therefore,the detection of malicious code is a research hotspot in the field of information security.Webshell is a kind of typical malicious code.It is a back-end program based on Web server.It has the characteristics of high concealment and great harm.Based on Webshell's typical attack method,escape method and existing detection methods,this paper proposes a comprehensive statistical method and deep learning Webshell detection model.The feature vector includes grammatical features and semantic features.In the statistical method,keywords,special functions,longest words,etc.are used as grammatical features,and the word frequency of the operation code sequence of Webshell code is used as a semantic feature.In deep learning,the word code encoding is used to encode the opcode sequence of the Webshell code,and each opcode in the sequence is encoded into a feature vector of a fixed dimension.The statistical method selects the random forest algorithm for detection,and the deep learning selects the long-term and short-term memory network(lstm)algorithm for training.A total of more than 5,000 normal PHP files and Webshells were collected from github as sample sets.Experiments show that the two models are better integrated by weighted integrated classifiers. |