Font Size: a A A

Research On Efficient Malicious Code Family Classification Technology Based On Visualization

Posted on:2024-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:C R YinFull Text:PDF
GTID:2568306941984349Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computers and the Internet,the level of informatization in various industries is gradually improving.People’s lives are closely related to the Internet,and the number of malicious software is also rapidly increasing,seriously threatening the security of cyberspace.If these malicious software are allowed to spread,it will not only cause great interference to people’s normal lives,but also bring serious harm to the normal operation of enterprises and society.Therefore,tracing and classifying the origin of malicious software families,and taking targeted countermeasures and measures can reduce the losses caused by malicious code.After analysis by researchers,it was found that malicious code samples from the same family have a certain degree of similarity in code logic and code habits.By comparing the similarity of feature codes,malicious codes with the same attack method can be classified into the same family,and thus classified into families.However,with the maturity of anti detection technology,malicious code may use different obfuscation methods to hide its own features,resulting in the failure of feature based classification methods.However,visualizing it as an image does not fundamentally change the texture and structure of the image.Automatic extraction of malicious code image features through deep learning algorithms can no longer rely on prior knowledge from experts,but also minimize the impact of confusion and other means.However,the existing classification schemes based on malicious code visualization also have the similar texture of binary Grayscale among different families,which is easy to cause misjudgment;In addition,with the deepening of the layers of Convolutional neural network,the number of parameters of the model gradually increases,which inevitably brings more time overhead,resulting in a large number of model parameters,high computational complexity,and slow prediction speed of the model.To solve the above problems,this paper studies efficient malicious code family classification technology based on visualization.On the one hand,it improves the feature expression ability of malicious code binary gray image,reduces the misjudgment rate of similar binary Grayscale of different families,and improves the classification accuracy;On the other hand,design and implement a lightweight classification scheme based on opcode visualization,reducing the computational complexity and parameter quantity of the model,and improving the prediction speed of the model.The main research work of this article is as follows.(1)The existing malicious code binary gray-scale images have similar texture structures among different families.When Convolutional neural network is applied alone,the features do not have the relevance of the image’s global context,which is prone to misjudgment.Therefore,this article designs and implements a highly representative malicious code family classification scheme based on binary visualization.The Resformer Classful network designed and implemented in this scheme combines the local fine-grained feature extraction advantage of Convolutional neural network with the global modeling advantage of Transformer.In the feature fusion stage,channels,spatial attention mechanisms,and residual connections are introduced.Residual connections can avoid the phenomenon of gradient vanishing,while attention mechanisms can distinguish the information contained in space and channels,making it possible to highlight more important areas,thereby effectively improving the semantic representation ability of malicious code images.Thruough ablation experiments on the public data set BIG 2015 and Malimg,Resformer Classful network achieved 98.38%and 96.85%accuracy on the test set,achieving the best performance.(2)Existing research has avoided the complex analysis process of artificial features by training neural network models.However,as the number of layers of the neural network deepens,the number of parameters in the model gradually increases,inevitably requiring more parameters and computation,consuming more computing resources.Therefore,this article designs and implements a lightweight malicious code classification scheme based on opcode visualization.Firstly,a method for visualizing opcodes based on information gain and co-occurrence matrix is proposed and implemented.This method filters opcode features,retains features that contribute more to distinguishing different malicious code families,and combines co-occurrence matrix to visualize them as a single channel grayscale image.Secondly,a lightweight convolutional Classful network MC-LCNN is designed and implemented.By introducing deeply separable convolution and void convolution,the network reduces the amount of model parameters and reduces the complexity of model calculation.Compared with existing lightweight malicious code family classification schemes,the computational complexity of this chapter’s scheme is 0.017G,reduced to about one tenth,improving the prediction speed of the malicious code family classification model.
Keywords/Search Tags:visualization of malicious code, malicious code classification, feature fusion, lightweight network
PDF Full Text Request
Related items