Font Size: a A A

Research And Implementation Of Source Code Vulnerability Detection Method Based On Deep Learning

Posted on:2024-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2568307136489034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and application of computer software,the use of computer systems is increasing exponentially,and the problem of software vulnerabilities threatens the security of computer systems.On the one hand,due to the diversity and complexity of current software,users will have different situations when using it,which will cause different vulnerability problems,resulting in an increase in the number of vulnerabilities.On the other hand,when computer software developers write codes,they will unintentionally leave loopholes to be exploited by attackers.As a computer algorithm,the vulnerability detection method can learn the relevant characteristics of the vulnerability function through some learning models,and detect whether there is a vulnerability in the software source code.However,a single learning model has relatively high requirements for usage scenarios in the actual application process,and has limited representation ability and detection efficiency for vulnerability features,and has a high rate of false negatives and false positives.To solve this problem,a combination of model and multi-feature fusion method can be used to solve it.Therefore,the research on source code vulnerability detection based on deep learning in this paper is of great significance.Around the source code vulnerability detection method based on deep learning,the main work of this paper is as follows:(1)In order to realize the deep representation of the source code text of the vulnerability function,this paper proposes a vulnerability detection method based on Distil Bert-LSTM and Multinomial Naive Bayes(Distil Bert-LSTM Multinomial Naive Bayes,DBM-MNB),which combines the two parts of DBM and MNB to realize vulnerability detection.This method generates the corresponding sequence for the source code,uses Distil Bert-LSTM to mine the local key features and global time characteristics of the vulnerability,deeply mines the dependency relationship between the statement of the vulnerability,and obtains the existence probability of the vulnerability;for difficult samples in the vulnerability detection process,the model optimizes detection through Multinomial Naive Bayes,uses TF-IDF vectorizer for data preprocessing,and performs chi-square test for feature selection,and outputs the obtained results to multiple naive Bayesian classifiers to obtain the final vulnerability detection results.The comparative experimental results show that the DBM-MNB model can improve the depth of vulnerability feature mining,and presents better performance indicators than the current part of the combination learning model.(2)In order to achieve a comprehensive characterization of vulnerable functions,this paper proposes a source code vulnerability detection method based on multi-feature fusion(CHRCode BERT).This method extracts multiple features of source code in both hierarchical and vertical planes,including convolutional neural network features,hierarchical attention features,and recurrent neural network features.Feature engineering is performed on vulnerabilities from three aspects of words,sentences,and documents,and three feature matrices are constructed,which are fused through an additive attention mechanism to obtain a feature matrix.At the same time,a series of operations such as adding special marks,padding processing,and generating masks to the source code sequence construct the feature matrix of Code BERT.Input the above two feature matrices into the Code BERT model for classification to achieve the effect of vulnerability detection.The comparative implementation shows that CHR-Code BERT presents better comprehensive performance indicators than some current single-feature vulnerability detection models.(3)Based on the above two vulnerability detection methods,this paper designs a source code vulnerability detection system based on deep learning.The system mainly includes file upload module,source code processing module,vulnerability detection module and result display module.The file upload module can upload the source code data of a single file,multiple files and compressed files;the source code processing module cleans and preprocesses the successfully uploaded files,and represents the source code into a form that can be recognized by the model;the vulnerability detection module calls DBM-MNB model and CHR-Code BERT model detect the source code,detect whether the source code has loopholes,and dump the detection results;the result display module visualizes the vulnerability detection results,and realizes historical data query and download through the query module save.The system test results show that the system has strong practicability.
Keywords/Search Tags:Deep learning, Source code representation, Vulnerability mining, Language model
PDF Full Text Request
Related items