A Research On Malware Family Identification Method Based On Serial Architecture

Posted on:2023-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:M Z Gao

Full Text:PDF

GTID:2568307058497494

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous emergence of Advanced Persistent Threat(APT)attacks,network security is facing severe challenges.As the main carrier of network attacks,malware has a high threat level.According to the statistics of security vendors,the number of malicious software is exponential growth.A variety of ransomware,backdoor,trojans,worms,illegal mining programs and other types of malicious software continue to emerge.In addition,in order to avoid detection,the authors of malware introduce different countermeasures such as polymorphism,advanced packer,encryption,and confusion into malicious sample components.These strategies make the malware originally belonging to the same family or the same attack organization look like different files,resulting in the gradual improvement of the complexity and evasiveness of malicious software.Based on this,the traditional automatic analysis methods used by security vendors have high omissions and false positives.In order to effectively detect a variety of malicious software,identify their families,and trace the attackers and even attack organizations behind them,security researchers in China and abroad have proposed various static or dynamic analysis methods.But for the current academic and industrial circles,effective identification of malware family is still an unresolved practical problem.According to the survey,there are still three serious scientific problems need to be solved:(1)static analysis can not break through the restrictions brought by strong encryption packing,confusion and other countermeasures;(2)malicious sample growth rate can not be balanced with the analysis detection rate;(3)the confusion and conflict of malware family labels of multi-security vendors.Therefore,this paper takes malware family identification as the core point,divides it into the above three scientific problems,and studies them respectively.The contributions of this paper are as follows:· Aiming at the scientific problem that high-level packing,encryption,confusion and other countermeasures severely limit the static analysis of malicious samples,this paper proposes a static detection method of malicious code based on multi-feature ensemble learning.By extracting the characteristics of bytes,instructions,and calls of malicious software at three different levels,the characteristics of executable files are maximized to map,so as to approach the upper limit of feature space as much as possible.The ensemble learning algorithm is used to enhance the stability of the base model.Finally,the error in the generalization process of single feature is corrected by the weight strategy voting algorithm.This method achieved 96.99% accuracy in malicious samples with packers such as VMP and Aspack.It can be seen that this method can not only produce better results in malicious sample detection without shelling,but also effectively alleviate the interference caused by advanced encryption,packing,confusion and other countermeasures.· Aiming at the scientific problem that the growth rate of malicious samples and the analysis rate do not match in large-scale malware family classification tasks,this paper proposes a sample data standardization method based on sample diversity screening.By summarizing the redundant positions of PE(Portable Executable)files,locating and extracting the code section where the crucial attack payload is located,the hash algorithm focuses on the crucial positions of samples,thus ignoring the impact of secondary file structure,and filtering homologous malicious samples with the same attack payload.According to the experimental verification on BODMAS dataset,26,069 samples were filtered on57,293 samples,and the filtering efficiency reached 45.5%.Thus,this method can significantly reduce the number of malicious samples for subsequent analysis and improve the efficiency of subsequent analysis.· Aiming at the scientific problem that the family label confusion and conflict among multiparty security vendors in malware family classification task,this paper proposes a malware family label rectification method based on behavioral semantics.This method analyzes malicious samples dynamically in the sandbox,and constructs malware behavioral semantics knowledge base offline by using ATT&CK technology matrix and Windows programming documents.The underlying API calls in the sandbox behavior report are mapped to behavioral semantics.Then the unsupervised clustering algorithm is used to cluster them.Finally,the inconsistent clue clusters in the clustering results are analyzed and reviewed.Determine and locate the label problem type to rectify the sample family label description.Three different label problems were found on the BODMAS dataset,involving 2,003 malicious samples.After rectifying the family labels of the malware,the accuracy was improved from 80.8% to 83.4%,an increase of 2.6%,using the same features and the same model training.

Keywords/Search Tags:

malware, hybrid analysis, multi-level architecture, family identification, label rectification

PDF Full Text Request

Related items

1	Android Malware Detection And Family Definition Method Based On Multi-dimensional Network Traffic Characteristics
2	Design And Implementation Of Multi-classification Tool For Android Malware Family Based On Knowledge Graph
3	Research On Multi-classification Scheme Of Android Malware Family
4	Research On Android Malware Detection And Malware Family Classification Based On Multi-context Features
5	Research Of Malware Homology Analysis Based On Family Gene Similarity
6	Multi-label Learning Algorithms Based On Local Pairwise Label Correlations And Its Application In Zhihu
7	Android Malware Family Classification Based On Dynamic Analysis
8	Research On The Classification Method Of Android Malware Family Based On Machine Learning
9	A Method Of Hybrid Analysis Of Android Malware Detection Based On Multi-feature
10	Android Malware Detection And Family Classification Based On Static Analysis