Font Size: a A A

Deep Learning-Based Static Binary Vulnerability Prediction Technology

Posted on:2023-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:B J DuanFull Text:PDF
GTID:2568307169479564Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Static vulnerability prediction is a method that leverages static analysis to predict software vulnerabilities.It is commonly used in software evaluation and also can be used as an auxiliary wizard to help dynamic analysis tools improve their efficiency.Early static analysis methods contain value-set analysis,symbolic execution,etc.These methods have high complexity,and often require strong professionalism and extremely high knowledge reserves of researchers.After the deep learning applications sprung up,how to combine deep learning research with vulnerability prediction has become the heating point.However,most research works are based on program source code.Considering that programs in practical applications are rarely open-source and it is difficult to obtain their source codes,it is necessary and urgent to design a static vulnerability prediction method based on binary programs.Existing works applying deep learning to binary static vulnerability prediction can be divided into two categories: bytecode-based and disassembly-based methods.The former directly analyzes the binary program,but only program-level vulnerability prediction can be performed due to the mixing of code and data in the binary program and low readability; the latter obtains the assembly code of the program through a disassembly tool and then applies deep learning method for vulnerability prediction.Assembly code is more readable and can make more fine-grained predictions than binary code.However,there are still many problems and challenges in disassembly-based vulnerability prediction research: 1)The extraction of vulnerability features is not comprehensive,and most models do not consider the statistical properties or semantic features of the assembly code; 2)Always ignore the relationship between basic blocks while extracting semantics; 3)Hard to obtain the features of the branch structure in control flows; 4)The related research are close-source,and there is a lack of public binary standard datasets,which leads to the difficulty in doing model performance comparison.In the light of the above,this paper proposes a static binary vulnerability prediction technology based on deep learning.The features of assembly functions are divided into statistical features,semantic features,and structural features.This technology uses deep learning methods to extract and fuse each kind of feature and performs function-level vulnerability prediction for binary programs.The main contributions of this paper are as follows:First,a multi-feature fusion static binary vulnerability prediction framework is proposed,which is the first time in the vulnerability prediction field that integrates statistical,semantic,and structural features.Second,a path-sensitive assembly language pre-trained model is proposed.We introduce the pre-trained idea of the field of natural language processing in vulnerability prediction.We design the SPP(Same Path Prediction)task,realize the pre-trained language model based on assembly language,and improved the model’s ability to comprehend assembly semantics;Third,a new structural feature extraction network is designed based on the selfattention mechanism and convolutional neural network.While incorporating the original graph features of the assembly function,the basic block can be trained to automatically obtain the information of other related basic blocks.Through the convolutional neural network,all the basic block features in one function will be integrated.Due to the use of deep learning methods,this model is able to automatically learn binary program vulnerabilities without much prior knowledge from researchers.By the way,since this learning has nothing to do with the particularity of the vulnerability itself,the method proposed in this paper can be applied to various vulnerabilities.The experiments show that we achieve a recall of 80.5% and K-200 ~ K-1000 accuracies all above97%,which are 12 percentage points and nearly 20 percentage points,respectively,higher than those of the state-of-the-art V-Fuzz.
Keywords/Search Tags:Vulnerability Prediction, Deep Learning, Neural Network, Natural Language Process
PDF Full Text Request
Related items