Font Size: a A A

Research On Source Code Vulnerability Detection Method Based On Graph Neural Network Combining Dynamic And Static

Posted on:2023-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:H HaoFull Text:PDF
GTID:2568306833989129Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software is widely used in various industries,and the occurrence of vulnerabilities is unavoidable in the software development.Therefore,an automatic vulnerability detection tool with high detection accuracy is urgently needed to detect and fix vulnerabilities as much as possible before the software is officially released.Traditional vulnerability detection methods are mainly rule-based and machine learning-based.The rule-based method has poor detection ability of vulnerability variants and new vulnerability types due to the need for human experts to participate in the formulation of rules.Machine learning-based methods are powerful,but require human-defined vulnerability characteristics.With the development of deep learning and code representation technology,code attribute graphs are used to express the semantics of programs,deep learning models Bidirectional Long Short-Term Memory(BLSTM),Bidirectional Gate Recurrent Unit(BGRU),etc.use the graph representation of the code to further improve the accuracy of the detection model.However,the control flow and data flow information in the code attribute graph only contain the static attributes of the code,and do not reflect the runtime characteristics of the code(that is,its dynamic attributes),so deep learning models based only on the code attribute graph must exist the problem of high false negative rate and false positive rate.In addition,the vulnerability detection method based on dynamic analysis is not only inefficient,but also requires different vulnerability samples to cover enough code branches to achieve better detection results.But the existing standard vulnerability library has a single mode of data vulnerability,so the detection capability of this vulnerability detection method is severely limited.In view of the above problems,this thesis proposes a source code vulnerability detection method Hybrid VDS based on Graph Neural Network(GNN)combining dynamic and static.On the one hand,the method solves the problem of a single vulnerability pattern in the vulnerability library sample,and on the other hand provides sufficient code static and dynamic syntax and semantic information for the vulnerability detection model,which reduces the false positive rate and the false negative rate.The research content of this thesis is as follows:(1)Aiming at the problems of low data sample quality and single vulnerability mode in standard vulnerability database,a vulnerability training data collection method based on open source warehouse is proposed.Candidate data is obtained by initial screening of Git Hub repository star rating,fork number,repository size,and language used,and then two types of models,Naive Bayes(NB)and BLSTM to comprehensively evaluate the candidate data to realize the secondary screening of the data,so as to effectively expand the existing vulnerability database data samples.(2)Aiming at the problems of high false positives and high false negatives in current vulnerability detection methods,a Hybrid VDS method is proposed.In terms of static features,the code attribute graph is used and three types of edges associated with the code attribute graph are added;in terms of dynamic features,the calling sequence of the application programming interface(API)and the actual execution sequence of the code are extracted,and the execution sequence includes the specific execution path,the value of each variable in the path,and the function parameter value.These static features and dynamic execution feature information are input into the graph network,so that the static syntax,semantic features and dynamic semantic features of the vulnerable code can be fully utilized by the vulnerability detection model.(3)Design and implement a prototype system based on Hybrid VDS.The modules and their functions of the prototype system are analyzed,the key algorithms are introduced,and the complete training and detection process is described.Comprehensive experiments are conducted from different dimensions through multiple types of data from different sources.The results show that the prototype system reduces the average false positive rate from 25.5% to 6%among the 22 types of vulnerability data on the standard warehouse dataset by using dynamic features.The average false negative rate decreased from 9% to 8%.Among the 10 types of vulnerability data on the open source warehouse dataset,the average false positive rate decreased from 25.31% to 19.74%,and the average false negative rate decreased from 19.72%to 19.33%.
Keywords/Search Tags:Vulnerability Detection, Graph Neural Networks, Static Analysis, Dynamic Analysis
PDF Full Text Request
Related items