| At present,Web application layer attacks emerge in endlessly,especially the network security threat represented by WebShell is the most lethal.Attackers use HTTP requests to upload and execute WebShell to control intruded sites.However,the existing WebShell scanner detection schemes have many shortcomings,such as high missed detection rate,inability to actively defend,easy to be bypassed and so on.Therefore,WebShell detection and defense technology needs further in-depth study.Designing a high-performance Web Shell scanning detection system is conducive to preventing malicious attacks on Web applications and reducing the occurrence of Web security incidents.In order to effectively classify and identify malicious requests against attacks on Web application layer,and to study supervised learning methods,a malicious request classification model based on TF-IDF word segmentation strategy of non-repetitive multiple N-Gram and random forest method is proposed to overcome the shortcomings of insufficient content and sparse features of the request text.In order to effectively identify WebShell backdoor,a three-tier scanning recognition system with fingerprint recognition,feature matching and Bayesian classification model is designed.The model is trained after feature extraction from a large number of sample data collected from Secrepo security database and other sources,and the reliability of the model and the system is verified on the test set.The experimental results show that the accuracy,recall rate and F1 value of the test set can reach more than 98% through the random forest classification model constructed under multi-N-gram participle with short text and low semantics.The recognition rate of three-tier WebShell scanning system for collected WebShell samples is more than 90%.Compared with similar products,it has better detection rate and stability of WebShell. |