Font Size: a A A

Research On Malicious Code Classification Method Based On Knowledge Grap

Posted on:2024-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q D HeFull Text:PDF
GTID:2568307106483174Subject:Electronic information
Abstract/Summary:PDF Full Text Request
There is a huge amount of malware in the Internet,which poses a serious threat to network security.According to the "Internet Security Threat Report" released by the CERT,38% of network security incidents are caused by malware.Therefore,how to classify and identify huge malwares not only helps enterprises and institutions to carry out targeted security defense,but also has important guiding significance for ensuring data security and personal information security.However,the current malware classification methods have shortcomings such as incomplete sample feature extraction,lack of intrinsic correlation between features,and weak ability to describe malware behavior,resulting in low classification accuracy of existing schemes.In the above background,this thesis uses the characteristics of clear description,reasoning and interpretability of knowledge graph relationships to extract malware entities and relationships to construct malware knowledge graphs,and conduct research on malware classification methods.The main work content and research results of this thesis are as follows :1.In view of the incomplete extraction of malware behavior features and the difficulty in describing the relationship between features,this thesis uses a combination of dynamic and static methods to extract malware knowledge and construct it into a knowledge graph.First,the malware is dynamically run in the sandbox,and the operation result reports such as the link library and the registry operation record are obtained;then,the static reverse analysis is performed on the malware,and the static features such as the API sequence and the API call diagram are extracted;Then,the results of static analysis and dynamic analysis are fused to define the malware ontology and relationship,and generate triples in the malware knowledge graph;finally,these triples are stored in the Neo4 j graph database to complete the malware knowledge graph build.The complementary analysis method combining dynamic and static can extract malware entity knowledge and relational knowledge more comprehensively.2.Aiming at the problem that the features in the malware classification task are not comprehensive and cannot represent the relationship between features,a malware classification method based on the API feature matrix is proposed.Firstly,reversely extract the API sequence of the malware number;then treat the API in the API sequence as a word,and use Word2 Vec technology for word embedding to obtain the API word vector integrating context semantics;then use Trans E technology to learn the representation of the malware knowledge map,Obtain the API entity vector;then fuse the word vector and entity vector of the same API to obtain the malware API feature matrix enhanced by the knowledge map;finally use the feature matrix as input to train the classification model using Text CNN.The experimental results show that the accuracy rate of this method reaches 93.8% in the classification task of malware family,which is higher than other malware classification methods based on API sequences.3.Aiming at the problem that it is difficult to express the call relationship between APIs and the API semantic representation ability is weak in the malware classification task,a malware classification method based on the API call relationship graph is proposed.Firstly,the malware is reverse-analyzed to extract the API functions and the calling relationship between APIs;then the API is embedded using two technologies of knowledge graph representation learning and BERT;then the semantic vector of the API is used as the node of the graph,and the The call relationship is passed to GCN as the edge of the graph to learn the characteristics of malware samples,and finally classify the malware.Experiments have proved that the proposed method has an accuracy rate of 86.2% in the malware classification task,which is higher than other malware classification methods based on API call relationships.
Keywords/Search Tags:Malware, Software reverse engineering, Application programming interface, Knowledge graph, Neural network
PDF Full Text Request
Related items