| With the continuous expansion of the network scale and the emergence of new technologies and applications,the Internet has been integrated into all aspects of social production and people’s lives.At the same time,the increasing number of cyber attacks also makes cyberspace face unprecedented security threats.On one hand,attackers continuously modify and improve existing malware using various obfuscation techniques to generate large-scale malware variants that evade detection,making traditional signature-based classification methods difficult to respond effectively and promptly.On the other hand,attackers constantly exploit zero-day vulnerabilities in operating systems and applications or use brand-new technologies to carry out attacks,resulting in the emergence of unknown or new malware attacks and their continuous and dynamic increase,making it difficult to classify in an effective and consistent way by the traditional classification methods.Under the context of malware attacks being strongly concealed,diverse means and continuously increasing,this paper focuses on three challenges in open world,respectively the difficulty of detecting known malware,the difficulty of classifying unknown malware,and the difficulty of updating classification model in a continuous way.To address these challenges,this paper conducted research on known malware classification,unknown malware classification,incremental learning and updating of classification model,with the aim of constructing an effective intelligent classification platform and defense strategy.The goal is to improve continuously the classification capabilities for malware attacks,provide support for the construction of cyberspace security defense capabilities,and assist in the security of our country’s critical information infrastructure.The research content and contributions of this paper are as follows:(1)More Accurate Detection: Research on Known Malware ClassificationA malware classification method based on the fusion of local and global features is proposed,which solves the problem that the existing methods fail to fully mine the semantic information in the dynamic behaviour of malware.The local semantic feature information is extracted from the API call sequence by constructing a stacked convolutional neural network,and the global semantic feature information is extracted from the API semantic graph through the graph convolutional network.The integrated learning strategy is used to integrate the local and global features effectively,enriching the features with comprehensive semantic information,which improves malware classification performance.Experimental results show that this method can effectively improve the classification performance of known malware.(2)Better Classification: Research on Unknown Malware Classificationa)A few-shot malware classification method based on multi-view flattening is proposed,which solves the problems of insufficient expressive ability when the malware is visualized as an image and the weak generalization ability of the few-shot model.At the data representation level,a multi-channel malware image generation method based on multi-view is proposed so that malware images can contain multi-view and multi-granularity information.At the model optimization level,the generalization ability of the model is enhanced by investigating the method based on adaptive sharpness-aware minimization in the few-shot scenario,minimizing the loss value and the sharpness reduction to find the flat local minimum of the objective function.Experiments results demonstrate that the proposed method significantly improves the few-shot malware classification performance at data representation and model optimization levels.b)A few-shot malware classification method based on sample adaptation is proposed,which solves the problem of overfitting due to insufficient unknown or new malware samples,as well as the problem of interferential information between samples when measuring.The convolution parameters are adaptively adjusted based on the input malware samples to realize dynamic feature embedding based on sample adaptation so that the feature embedding of samples can better represent the semantic information of behaviour patterns contained in few-shot sets.A dual-sample dynamic activation function is proposed to adaptively adjust the activation function parameters according to the correlation between the query sample embedding and the class embedding.The query sample and class embeddings are then dynamically and non-linearly activated using the dual-sample dynamic activation function to reduce interferential information between query samples and class embeddings,thereby improving the accuracy of measurement.Experiments results show that the method is more effective than existing few-shot malware classification methods and yields state-of-the-art performance.(3)Remember More: Research on Incremental Learning and Updating of Classification ModelA few-shot class-incremental learning based on knowledge-reshaping is proposed.The malware classification model can be incrementally learned and updated in a few-shot scenario by adopting feature enhancement in the base class training and new class adaptation stage to alleviate the catastrophic forgetting problem.The variational auto-encoder is used to simulate human memory and reshape historical knowledge,which improves the retention of historical knowledge and enhances the representation ability of new knowledge in the base class training stage.Objective function constraint is used to enhance the ability of the graph model to accept and integrate the new knowledge in the new class adaptation stage,thus alleviating the catastrophic forgetting problem.The above strategies enable the classifiers learned on an individual session to be applicable to all classes.The experimental results show that this method can effectively alleviate the catastrophic forgetting problem faced by classification model during incremental learning and updating.The above contents respectively study the three key technologies of malware classification in open world.A large number of experiments show the effectiveness and feasibility of our methods in solving the problems of known malware classification,unknown malware classification,incremental learning and updating for the classification model,fully showcasing the application prospect of this technology in the increasingly severe network security environment. |