Deep Learning-Based Static Binary Vulnerability Prediction Technology

Posted on:2023-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:B J Duan

Full Text:PDF

GTID:2568307169479564

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Static vulnerability prediction is a method that leverages static analysis to predict software vulnerabilities.It is commonly used in software evaluation and also can be used as an auxiliary wizard to help dynamic analysis tools improve their efficiency.Early static analysis methods contain value-set analysis,symbolic execution,etc.These methods have high complexity,and often require strong professionalism and extremely high knowledge reserves of researchers.After the deep learning applications sprung up,how to combine deep learning research with vulnerability prediction has become the heating point.However,most research works are based on program source code.Considering that programs in practical applications are rarely open-source and it is difficult to obtain their source codes,it is necessary and urgent to design a static vulnerability prediction method based on binary programs.Existing works applying deep learning to binary static vulnerability prediction can be divided into two categories: bytecode-based and disassembly-based methods.The former directly analyzes the binary program,but only program-level vulnerability prediction can be performed due to the mixing of code and data in the binary program and low readability; the latter obtains the assembly code of the program through a disassembly tool and then applies deep learning method for vulnerability prediction.Assembly code is more readable and can make more fine-grained predictions than binary code.However,there are still many problems and challenges in disassembly-based vulnerability prediction research: 1)The extraction of vulnerability features is not comprehensive,and most models do not consider the statistical properties or semantic features of the assembly code; 2)Always ignore the relationship between basic blocks while extracting semantics; 3)Hard to obtain the features of the branch structure in control flows; 4)The related research are close-source,and there is a lack of public binary standard datasets,which leads to the difficulty in doing model performance comparison.In the light of the above,this paper proposes a static binary vulnerability prediction technology based on deep learning.The features of assembly functions are divided into statistical features,semantic features,and structural features.This technology uses deep learning methods to extract and fuse each kind of feature and performs function-level vulnerability prediction for binary programs.The main contributions of this paper are as follows:First,a multi-feature fusion static binary vulnerability prediction framework is proposed,which is the first time in the vulnerability prediction field that integrates statistical,semantic,and structural features.Second,a path-sensitive assembly language pre-trained model is proposed.We introduce the pre-trained idea of the field of natural language processing in vulnerability prediction.We design the SPP(Same Path Prediction)task,realize the pre-trained language model based on assembly language,and improved the model’s ability to comprehend assembly semantics;Third,a new structural feature extraction network is designed based on the selfattention mechanism and convolutional neural network.While incorporating the original graph features of the assembly function,the basic block can be trained to automatically obtain the information of other related basic blocks.Through the convolutional neural network,all the basic block features in one function will be integrated.Due to the use of deep learning methods,this model is able to automatically learn binary program vulnerabilities without much prior knowledge from researchers.By the way,since this learning has nothing to do with the particularity of the vulnerability itself,the method proposed in this paper can be applied to various vulnerabilities.The experiments show that we achieve a recall of 80.5% and K-200 ~ K-1000 accuracies all above97%,which are 12 percentage points and nearly 20 percentage points,respectively,higher than those of the state-of-the-art V-Fuzz.

Keywords/Search Tags:

Vulnerability Prediction, Deep Learning, Neural Network, Natural Language Process

PDF Full Text Request

Related items

1	Research Of Vulnerability Prediction Based On Deep Learning And Robustness Verification On Deep Learning Model
2	From Code To Natural Language: Type-aware Sketch-based Seq2seq Learning
3	Joint Learning Methods For Distributed Representations Of Natural Language
4	Natural Language Processing Of Ancient Books Of Chinese Traditional Medicine Based On Deep Learning
5	Research And Application On Method Of Generating SQL Through Natural Language Based On Deep Learning
6	Deep Learning-Based Software Vulnerability Prediction
7	Research On The Method Of Robotic Object Detection Base On Natural Language Expression
8	Research On Software Vulnerability Prediction Method Based On Deep Transfer Learning
9	Research On Optimization Of Deep Learning Model For Natural Language Processing
10	Sign Language Recognition And Gait Prediction Based On Deep Learning