Font Size: a A A

Vulnerability Detection Based On Program Slicing And Deep Learning

Posted on:2024-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2558307136495214Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the widespread use of the Internet and the extensive application of software,the presence of vulnerabilities can lead to serious security threats and losses.In this context,the development and innovation of vulnerability detection techniques have become crucial for ensuring the security of software systems.The rapid development of software and the increasing speed of software iteration imply that vulnerability detection techniques need to quickly adapt to new software versions and updates.Deep learning models can extract features and patterns by learning from a large amount of code and vulnerability samples,accurately identifying and predicting potential vulnerabilities.However,existing deep learning-based vulnerability detection methods operate at a coarse granularity,typically at the file,function,or line level,which results in lower accuracy and higher false positive rates.To overcome these challenges and improve the performance of vulnerability detection techniques,this study focuses on the combination of deep learning and program slicing.In response to the problem that existing code vulnerability detection methods based on program slicing only capture the semantic information of the code,which may lead to incomplete and inaccurate identification of potential vulnerabilities,a static slicing semantic and metric-based vulnerability detection method using LLVM IR is studied.An algorithm for detecting vulnerability points based on LLVM IR is proposed.The symbolic slicing tool Sym Pas is employed to obtain slices of interest,and the Inst2 vec method is used to obtain vector representations of LLVM IR instructions.The method combines instruction-level slicing metrics such as density,overlap,coverage,and cognitive complexity to analyze the relationships and features between slicing statements in depth.A hybrid model,Res CNN-GRU,is constructed for training,effectively integrating and learning the extracted features.This hybrid model can make full use of the semantic and metric information of the slices,thereby further improving the performance of vulnerability detection.The proposed approach utilizes LLVM IR as an intermediate representation language,demonstrating good adaptability and achieving favorable results in terms of accuracy and false negative rate by leveraging the semantic and metric information of the slices.In response to the limitations of current static slicing methods in code vulnerability detection,especially in discovering vulnerabilities under specific execution conditions,a code vulnerability detection method based on dynamic slicing and pre-trained models is studied.Dynamic slicing technology can reflect the actual execution behavior of a program,making it more realistic and accurate.It can discover vulnerabilities that only occur under specific execution conditions.Furthermore,dynamic slicing methods can represent programs as smaller granularity code blocks,capturing the semantic features of the code in more detail.The Code BERT pre-trained model is used to represent the sliced code blocks as two-dimensional tensors.These sliced blocks are treated as grayscale images,encoding the structure and semantic information of the code as pixel values.The benefit of this approach is the ability to capture the semantic features of the code more comprehensively,including local structure and contextual information.Additionally,leveraging the feature extraction capabilities of the computer vision model Swin Transformer,critical vulnerability features are further extracted from the images.This method enables more accurate vulnerability detection since the computer vision model can effectively learn important features in the images,helping identify defects in the code accurately.By combining the advantages of dynamic slicing technology and pre-trained models,the proposed method comprehensively captures the semantic features of the code and key vulnerability features,thereby improving the accuracy of code vulnerability detection and reducing false positive rates.The two proposed methods in this study were experimentally validated,and the results demonstrate their effectiveness in code defect detection.The experiments showed a significant reduction in false positives and false negatives,thereby improving the accuracy and reliability of vulnerability detection.Moreover,these methods exhibited good adaptability,expanding the application scope of vulnerability detection.
Keywords/Search Tags:Static Slicing, Slice Metrics, Pre-trained Model, Dynamic Slicing, Swin Transformer
PDF Full Text Request
Related items