| Webshell is a command execution environment that exists in the form of a web page file.It is often used maliciously by attackers to store normal files in the website system and becomes a backdoor file for hackers to remotely control the server through a browser.Because the malicious code of Webshell accesses the server through port 80,it will not be intercepted by the firewall,nor will it leave a record in the system log.It has strong concealment and is an important security threat in the Web system.How to effectively and accurately detect Webshell malicious code in the huge file system of the Web server has become one of the research hotspots in the entire Internet security field.This paper focuses on the detection method of Webshell malicious code,proposed a Webshell detection scheme based on comprehensive strategy,and verifies the effectiveness of the scheme through experimental analysis.The current mainstream Webshell detection methods mainly include signature matching,such as "Safe Dog","D Shield" and other detection tools.However,because signature matching relies on the signature malicious code base for detection,it is difficult to detect such methods after an attacker confuses and encrypts Webshell.So the detection method based on feature code matching is not ideal in terms of accuracy and recall rate.The goal of this research is to analyze the characteristics of the code layer and the underlying Opcode sequence in Webshell malicious code,and propose a static detection method based on the code layer rule detection technology and a Webshell malicious code detection method based on the combination of Opcode sequence and machine learning algorithm.Combining the features of the code layer with the features of the bottom layer,a Webshell detection method based on a comprehensive strategy is proposed,which can realize the omnidirectional detection of malicious Webshell code and achieve a high detection efficiency.Through a large number of Webshell detection training and prediction,in order to achieve a higher accuracy and recall rate.The main work of this article is as follows: First,in the code layer module.We have collected various forms of Webshell files that appear on the Internet,including a large number of encrypted and obfuscated Webshell files,and the principles of mainstream Webshell malicious code detection tools and The evaluation indexes such as accuracy are analyzed and compared.Through research on the diversity and complexity of obfuscated Webshell files,nearly ten rule-based detection modules are proposed.The purpose is to achieve good detection results for unobfuscated and common Webshell.In addition,all the PHP code in the sample is extracted,the data will be preprocessed,and the machine learning algorithm will be trained and predicted.Second,in the Opcode module.The data collection,data cleaning and data processing of all Opcode sequences are carried out,and a machine learning detection method based on Opcode sequences is proposed.Through the experimental test analysis,the accuracy and recall rate of the algorithms such as the Naive Bayes algorithm,decision tree algorithm,random forest algorithm,K-nearest neighbor,etc.are tested separately and comprehensively.Third,the above two modules were tested and analyzed through a large number of samples,and the text feature extraction algorithm was optimized to improve the accuracy and recall rate of machine learning,and the two modules were integrated together to verify the detection effect.Through model experiments and analysis,it is concluded that the detection accuracy of Webshell detection method based on comprehensive strategy has reached 98.8%.It is demonstrated that the Webshell detection scheme based on comprehensive strategy can not only identify known Webshells,but also achieve good detection results for obfuscated and unknown Webshells.The goal of future work is to expand the research goals to other scripting languages such as ASP,JSP,etc.,and to expand the number of data collection and improve the types of samples. |