In the context of the global anti-epidemic era,people rely more than ever on computers and networks to collaborate with each other.Just like the new coronavirus that spreads in the real world,with the development of emerging communication technologies such as 5G,the types and spread of malicious code are also increasing exponentially.At present,emerging security threats represented by APT attacks have had a significant impact on the information security of countries and enterprises.Traditional malware identification methods mainly relies on the static signature of the program to match malicious samples by mapping files to specific signatures.At present,the traceability of APT attack organizations mainly relies on manual analysis of samples,and the degree of automatic analysis is insufficient.In view of the significant breakthroughs made by the Transformer model in the field of deep learning in recent years,this paper will study the application of the Transformer model to malware.The first chapter summarizes the research status of malware at home and abroad,and the second chapter introduces the commonly used related techniques for malware analysis.At present,many malware analysis methods still use the B2M algorithm to visualize malicious program,and the B2M algorithm itself has the problem of noise introduction in the process of image generation.In order to solve this problem,the third chapter of this paper proposes a Transformer model classification method based on fixed-size image features,and proposes an improvement of visual mapping according to the generation process of malware images,and uses Lambda attention mechanism to learn the texture position relationship of malware images,and generalization validation is performed on both datasets.The proposed method in this paper achieves an accuracy of 99.30%on the Microsoft dataset.In Chapter 4,aiming at the problem of insufficient automatic analysis of APT samples,a fusion feature analysis method combining text features and image features is proposed,which integrates image and text information to classify APT tissue samples,and deploys a web-based endto-end analysis.Serve.The text feature extraction structure of this method combines the convolution structure and the Transformer model,which can flexibly deal with various malicious code sequence lengths.Compared with other malware classification methods based on all text sequences,the accuracy is improved by 1.5%.The method achieves a classification accuracy of 93.24%for six tissue samples led by APT-28,which can effectively respond to the security threat of APT attacks. |