Font Size: a A A

A Research On Malware Family Identification Method Based On Serial Architecture

Posted on:2023-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:M Z GaoFull Text:PDF
GTID:2568307058497494Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous emergence of Advanced Persistent Threat(APT)attacks,network security is facing severe challenges.As the main carrier of network attacks,malware has a high threat level.According to the statistics of security vendors,the number of malicious software is exponential growth.A variety of ransomware,backdoor,trojans,worms,illegal mining programs and other types of malicious software continue to emerge.In addition,in order to avoid detection,the authors of malware introduce different countermeasures such as polymorphism,advanced packer,encryption,and confusion into malicious sample components.These strategies make the malware originally belonging to the same family or the same attack organization look like different files,resulting in the gradual improvement of the complexity and evasiveness of malicious software.Based on this,the traditional automatic analysis methods used by security vendors have high omissions and false positives.In order to effectively detect a variety of malicious software,identify their families,and trace the attackers and even attack organizations behind them,security researchers in China and abroad have proposed various static or dynamic analysis methods.But for the current academic and industrial circles,effective identification of malware family is still an unresolved practical problem.According to the survey,there are still three serious scientific problems need to be solved:(1)static analysis can not break through the restrictions brought by strong encryption packing,confusion and other countermeasures;(2)malicious sample growth rate can not be balanced with the analysis detection rate;(3)the confusion and conflict of malware family labels of multi-security vendors.Therefore,this paper takes malware family identification as the core point,divides it into the above three scientific problems,and studies them respectively.The contributions of this paper are as follows:· Aiming at the scientific problem that high-level packing,encryption,confusion and other countermeasures severely limit the static analysis of malicious samples,this paper proposes a static detection method of malicious code based on multi-feature ensemble learning.By extracting the characteristics of bytes,instructions,and calls of malicious software at three different levels,the characteristics of executable files are maximized to map,so as to approach the upper limit of feature space as much as possible.The ensemble learning algorithm is used to enhance the stability of the base model.Finally,the error in the generalization process of single feature is corrected by the weight strategy voting algorithm.This method achieved 96.99% accuracy in malicious samples with packers such as VMP and Aspack.It can be seen that this method can not only produce better results in malicious sample detection without shelling,but also effectively alleviate the interference caused by advanced encryption,packing,confusion and other countermeasures.· Aiming at the scientific problem that the growth rate of malicious samples and the analysis rate do not match in large-scale malware family classification tasks,this paper proposes a sample data standardization method based on sample diversity screening.By summarizing the redundant positions of PE(Portable Executable)files,locating and extracting the code section where the crucial attack payload is located,the hash algorithm focuses on the crucial positions of samples,thus ignoring the impact of secondary file structure,and filtering homologous malicious samples with the same attack payload.According to the experimental verification on BODMAS dataset,26,069 samples were filtered on57,293 samples,and the filtering efficiency reached 45.5%.Thus,this method can significantly reduce the number of malicious samples for subsequent analysis and improve the efficiency of subsequent analysis.· Aiming at the scientific problem that the family label confusion and conflict among multiparty security vendors in malware family classification task,this paper proposes a malware family label rectification method based on behavioral semantics.This method analyzes malicious samples dynamically in the sandbox,and constructs malware behavioral semantics knowledge base offline by using ATT&CK technology matrix and Windows programming documents.The underlying API calls in the sandbox behavior report are mapped to behavioral semantics.Then the unsupervised clustering algorithm is used to cluster them.Finally,the inconsistent clue clusters in the clustering results are analyzed and reviewed.Determine and locate the label problem type to rectify the sample family label description.Three different label problems were found on the BODMAS dataset,involving 2,003 malicious samples.After rectifying the family labels of the malware,the accuracy was improved from 80.8% to 83.4%,an increase of 2.6%,using the same features and the same model training.
Keywords/Search Tags:malware, hybrid analysis, multi-level architecture, family identification, label rectification
PDF Full Text Request
Related items