Font Size: a A A

Research On Prototype Selection Techniques For Malware Family Classification

Posted on:2022-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:G H ChenFull Text:PDF
GTID:2558307154975109Subject:Engineering
Abstract/Summary:PDF Full Text Request
Malware detection and malware family classification are of great importance to network and system security.Currently,the wide adoption of deep learning models has greatly improved the performance of those tasks.However,deep-learning-based methods greatly rely on large-scale high-quality datasets,which require manual labeling.Obtaining a large-scale high-quality labeled dataset is extremely difficult for malware due to the domain knowledge required.In this work,we propose to reduce the manual labeling efforts by selecting a representative subset of instances,which has the same distribution as the original full dataset and maintains the accuracy of the family classification task.Our method effectively reduces the workload of labeling while mantaining the accuracy degradation of the classification model within an acceptable threshold.We compare our method with the random sampling method on two widely adopted datasets and the evaluation results show that our method achieves significant improvements over the baseline method.In particular,with only 20% of the data selected,our method has only a 2.68% degradation in classification performance compared to the full set,while the baseline method has a 6.78% performance loss.We also compare the effects of factors such as data quantity and model structure on the final results,providing some guidance for subsequent research.
Keywords/Search Tags:Malware, Deep Learning, Prototype Selection
PDF Full Text Request
Related items