Research On Malware Detection Model Based On Transformer

Posted on:2024-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:2568307103475224

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the intelligent era,the utilization rate of various mobile operating systems has gradually increased.Among them,Android operating system has become the most widely used operating system due to its open source,which also makes it the main target of malicious attackers.As one of the main factors threatening network security,it is very important to accurately identify and classify malicious software to protect the economic security and privacy security of Android operating system users.In recent years,with the development of large-scale pre-training model,how to apply it to the field of Android malware detection has become one of the hot research topics.At present,the malware detection technology based on large-scale pre-training language model has been relatively mature.However,the large-scale pre-training language model improves the classification accuracy of the malware detection model and also increases the computing load of the computer.In addition,the catastrophic forgetting problem caused by the deep learning model itself in the process of repeated training also brings challenges to the research of malware detection technology.This paper tries to solve the existing problems in malware detection from multiple perspectives.The structure of Transformer model is mainly optimized for the difficult application of the detection model in the actual environment,the structure of model input is optimized for the catastrophic forgetting problem,and the pseudo-label is provided to assist the model training of the detection model.The main research contents of this paper are summarized as follows:(1)Aiming at the problem that the training process of malware detection models requires large computational load and is difficult to apply in actual production environments,this thesis proposes a differentiable adaptive computation time for malware detection model based on Transformer,which is denoted as “DACTTrans MD”.The DACT-Trans MD model first improves the embedding layer module in the Transformer model to mine the time series characteristics existing in malware sample data,providing sufficient information for model training.Secondly,the encoding layer in Transformer is used to learn the relationship between the features of malware samples,avoiding the problem of gradient disappearance in the deep learning model,and ensuring the classification accuracy of the malware detection model.Finally,an improved differentiable adaptive computing time module dynamically stops the calculation of the deep coding layer,enabling the model to achieve stable computational accuracy and stop training in advance,thereby reducing the computational load required for model training and operation in actual production.(2)Concerning the problem of the rapid update of malware samples and the catastrophic forgetting problem in the process of model updating,this thesis proposes an Semi-supervised continual learning for malware detection model based on Transformer,which is denoted as “SSCL-Trans MD”.The SSCL-Trans MD model first uses an improved Lifelong Unsupervised Mixup algorithm to dynamically sample tagged historical samples and unlabeled new input samples to obtain mixed samples,reducing the adverse effects caused by sample imbalance.Secondly,Learning with Local and Global Consistency Algorithm is used to iteratively calculate the similarity score of unlabeled samples in the mixed sample to obtain false labels.Finally,the design adds a Multilayer Perceptron to classify malware and output the results.(3)Based on the Python framework,the DACT-Trans MD detection model and the SSCL-Trans MD detection model are designed and implemented.For the DACTTrans MD detection model,ablation experiments,comparative model experiments,and parameter sensitivity analysis experiments were conducted on three datasets,demonstrating that the DACT-Trans MD detection model can reduce model training time while maintaining accuracy.For the SSCL Trans MD detection model,ablation experiments,comparative model experiments,and parameter sensitivity analysis experiments were conducted on four datasets.The results show that the SSCL Trans MD detection model achieves better classification results in a semi supervised continuous learning scenario.

Keywords/Search Tags:

Malware Detection, Multi-label Classification, Transformer, Semi-supervised Learning, Continual learning

PDF Full Text Request

Related items

1	Multi-label Image Classification Techniques Based On Semi-supervised Learning
2	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
3	Distributed Semi-Supervised Learning
4	Research On Weakly-supervised Classification Methods Based On Samples And Labels Modeling
5	Research On Semi-Supervised Classification Based On Local Learning
6	Malware Traffic Classification Method Using Semi-supervised Learning
7	Research On Multi-label Text Classification Based On Semi-Supervised Learning
8	Research And Application Of Multi-label Learning Algorithm
9	AutoLink Semi-supervised Multi-label Study Of Literature Research And Implementation Methods
10	Research On Multi-label Classification With Incomplete Label Information