| In recent years,malicious code has shown a rapid development trend,and malicious code identification algorithms have also encountered various new problems and challenges.On the one hand,due to its large number of variants,fast generation speed,and wide impact,traditional malicious code detection methods cannot meet the requirements of fast and efficient detection.On the other hand,using traditional machine learning to analyze and detect malicious code has the problems of low speed and efficiency,and the accuracy of classifier is not high.How to effectively and quickly identify malicious code families has become the current research focus.The algorithm in this paper extracts the malicious code feature information from the image point of view,fuses the features into the feature image based on the signature matrix,and designs a malicious code family recognition model based on the deep learning.The research contents of this paper are as follows:(1)Aiming at the low efficiency of current malicious code family recognition algorithms and the problem that feature information may be lost due to feature image scaling,this paper proposes a new malicious code visualization method.First,in the data preprocessing,the malicious code samples are analyzed and three different static features are extracted: local feature information,assembly instruction set information and visible character information.Secondly,based on the local sensitive hash algorithm,the signature matrix of malicious code features is formed by mining and mapping,and then the signature matrix is transformed into the corresponding multi-channel mapping feature image.Finally,each malicious code sample is mapped into a feature image.This method has been tested on the malicious sample data set BIG 2015 and three different deep learning models based on Res Net,Dense Net and Perception.The experimental results show that the recognition accuracy of malicious code family has been improved by about 6.83%,and shows unique advantages over using single features.(2)In view of the high cost of collecting and labeling enough data,this paper designs a recognition model with high recognition accuracy and strong generalization ability based on transfer learning fine-tuning technology and regularization method.First of all,this paper fine-tuned the Rep VGG structure to adapt to the recognition task of this paper,then freeze the weights of the lower layers of the network,and train the higher layers to greatly shorten the training time.Then,the fine-tuned network model is used to generate an abstract representation of each malicious code feature image by performing multiple nonlinear transformations.Finally,the Cutout regularization technique is used to randomly mask a part of the square area of the malicious code feature image in each Epoch in the network model training process.The method is tested on the malicious sample data set BIG 2015.The experimental results show that the method has good recognition effect and low time consumption.At the same time,it can accurately identify malicious code families,achieving an accuracy rate of about 99.68% without requiring complex feature engineering.Even if very few training sets are used for training,it also achieves an accuracy rate of about 98.25%on the test set.This article has conducted in-depth research on the method of identifying malicious code families based on feature images,which not only preserves the similarity between the same malicious code families and the differences between different families,but also avoids the loss of feature information.It has a high accuracy and speed in identifying malicious code families,which provides a feasible solution for effective identification of malicious code families. |