| The development of information technology has brought convenience to people’s life.However,as people store more and more sensitive data in cyberspace,these sites that aggregate sensitive information have become the target of many intruders.Webshell,a kind of web page-based backdoor,is widely used to conduct attacks on the websites.Webshell detection methods are divided into static detection methods and dynamic detection methods.Traditional static Webshell detection methods are usually aimed at a single type of Webshell,and the traditional text features they use have a weak ability to represent Webshells,and the detection algorithm fitting ability is weak,so the detection accuracy of deformed Webshells is low.Therefore,this thesis proposes a hybrid network detection model based on Abstract syntax tree and Text-CNN for PHP and JSP Webshells.The main contributions of this study are as follows:Firstly,the crawler collects samples from the Internet and generates a data set containing PHP and JSP Webshells and normal scripts to support the research of this thesis.To ensure that the experimental environment is close to reality and the detection model is more robust,the number of normal samples collected in the dataset far exceeds the number of Webshells.The dataset contains 2177 Webshells and 11905 normal scripts,including the latest PHP and JSP Webshells and various "big trojan","small trojan" and "one sentence trojan" Webshells.Secordly,the traditional Webshell detection schemes based on text features has problems such as outdated malicious features.This study supplements many latest features of PHP and JSP Webshells and statistical features through the research on the new anti-kill Webshells and proposes a Webshell detection schema based on XGBoost and text features.In this study,through comparative experiments with different machine learning algorithms,it is found that the Webshell detection scheme based on text features can effectively detect common Webshells.Thirdly,Abstract syntax trees can better reflect the actual logic of the code.Therefore,they can effectively reflect the characteristics of the script.In this study,by improving the traditional abstract syntax tree feature extraction scheme,combined with Text-CNN neural network,a Webshell detection scheme based on improved abstract syntax tree and Text-CNN is proposed.In the comparative experiments with various machine algorithms and neural networks,it is found that the scheme proposed in this study can most effectively complete the Webshell detection task of PHP and JSP and achieves a detection accuracy of nearly 99.5% in the test dataset.Finally,from the perspective of engineering implementation,combined with the advantages of two detection schemes,this study designs and implements a Webshell detection system. |