Font Size: a A A

Webshell Detection Based On Reinforcement Learning Feature Selection

Posted on:2023-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WuFull Text:PDF
GTID:2558306848455624Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Webshell,also known as "Web backdoor",is a scripting language program that enables Web servers to be remotely accessed.It is frequently exploited by hackers to attack web servers and is currently one of the most serious threats to website security.Attackers upload Webshell scripts to web servers by exploiting server vulnerabilities or bypassing weak security configurations,then run attacks causing more serious damage to web servers.Around 13,000 websites were implanted with backdoors in China in the first half of 2021 alone,so how to identify Webshell accurately and effectively is critical for network security protection.In this paper,we study the Webshell of PHP language.Firstly,we extract features from static PHP files and construct a classification model for Webshell detection.Then,using a Reinforcement Learning algorithm,we make feature selection decisions and explore feature subsets by maximizing accuracy.Ultimately,we achieve the goal of improving the accuracy of Webshell detection and obtaining feature subsets through automatic feature selection.The main contents and contributions of this paper are as follows:(1)This paper proposes a Webshell detection model(Text CNN-Attention),which combines the Text Convolutional Neural Network with the spatial Attention mechanism.Firstly,we extract text features from static PHP files and fuse the extracted features to build feature matrixes for the Webshell detection model.And then,the spatial Attention mechanism is introduced to the Text Convolutional Neural Network.Through self-learning,the Webshell detection model was able to obtain the weight distribution of different target regions in feature matrixes,allowing it to pay more attention to the target region during convolutional operations,thus improving the accuracy of Webshell detection.(2)In order to select the best feature subset for Webshell detection,this paper further optimizes the model and proposes a Webshell detection model based on automatic feature selection via Reinforcement Learning(Text CNN-Attention-RL).The Asynchronous Advantage Actor-Critic(A3C)algorithm is employed as a feature selector,while the Webshell detection model is used as a classifier.The feature subset selected by the selector is used to perform the Webshell detection task.The classifier feeds the accuracy of Webshell detection as a reward to the selector.After receiving the feedback,the feature selector updates the A3 C model parameters.The whole process continues interactively,and the A3 C model converges to obtain the best feature subset for Webshell detection.This paper collected and collated 7,056 benign samples and 2,084 Webshell samples from open-source frameworks and platforms.After 10-fold cross-validation training,the accuracy,precision,recall and F1 score of the Text CNN-Attention model reached 95.61%,88.73%,92.20% and 90.43%.Compared with the mainstream Webshell detection models,the Text CNN-Attention model has higher performance.The Text CNN-Attention-RL model is improved based on the Text CNN-Attention model.Experimental results show that the accuracy and F1 score of the Text CNN-Attention-RL model are improved by 0.27% and 0.46% compared with the Text CNN-Attention model,while the number of features used is reduced by 23.33%.It shows that the automatic feature selection method based on Reinforcement Learning proposed in this paper can reduce redundant features and irrelevant features to some degree and the Text CNN-Attention-RL model has a good ability to reduce the dimension and high detection performance.
Keywords/Search Tags:Webshell detection, Feature fusion, Reinforcement learning, Attention, Feature selection
PDF Full Text Request
Related items