| Software vulnerabilities have always been a huge threat to software companies and users.With the exposure of various software security incidents and their impact on users,software security has attracted increasing attention from public and media.In order to fix software vulnerabilities as soon as possible and prevent them from being exploited by attackers,it is necessary to find software vulnerabilities in time,which promotes the continuous development of software vulnerability detection technology.In recent years,machine learning and deep learning technologies have developed rapidly and have been applied to various fields.Deep learning technologies can automatically extract context features associated with vulnerabilities from source code,so deep learning is introduced to predict software vulnerabilities of source code.This field has great potential but less research has been done,and there are no guiding principles and comprehensive predicting systems.Through analyzing and researching deep learning methods in vulnerability prediction,this paper summarizes the challenges faced in this field and presents corresponding solutions for each of them,and realizes a vulnerability prediction system that meet the requirements of both coarse and fine granularity.Experimental verification shows that the system can make coarse-grained predictions at file level to give a reference for whether it contains vulnerabilities,and can also perform fine-grained prediction and location of certain types of vulnerabilities for a small code snippet in the source code.And this paper also verifies the possibility of finding new types of software vulnerabilities by fine-grained prediction system.Specifically,the main contributions of this paper are summarized as follows:1)Analyze applicable framework for deep learning-based vulnerability prediction,and the key problems and challenges faced in this field.First of all,for the problem that there is no exact guiding principle in this field,this paper analyzes and discusses the mutual adaptability of deep learning and vulnerability prediction,and draws guiding principles that can be followed by a basic deep learning-based vulnerability prediction system.These principles are centered on three issues:prediction granularity,intermediate representation,and selection of deep learning algorithms.Based on these principles,this paper gives two types of predictions framework for coarse-grained and fine-grained requirements,and made a detailed analysis of the framework's design.This paper also analyzes other more specific challenges in detail,including vulnerability location and type identification,the problem of very long sequence,task targeting of features,and cross-project prediction.2)Design and implement a coarse-grained prediction system based on deep learning.The system is based on the framework given in the guiding principles to solve the problem of the very long sequences in the binary deep learning classification algorithm,so that feature extraction no longer relies only on vulnerability labels,and more internal characteristics of data can be learned via unsupervised methods.The results of multi-group experiment confirm the feasibility of the framework.Under this framework,this paper designs two augmented feature extraction methods,two-model learning TML and two-task learning TTL,so that vulnerability labels are also taken into consider during feature extraction,to get more differentiated features between vulnerable and clean code data.These methods have strong generality and are also suitable for coarse-grained prediction based on other feature extraction algorithms.For the challenge of cross-project prediction,this paper designs a versatile intermediate representation and a general multi-project feature extraction method based on the adversarial learning method AL and verifies its effectiveness.3)Solve the challenges of vulnerability location and type identification,and verify the feasibility of predicting new vulnerability types.For vulnerability location and type identification,by carefully designing fine-grained intermediate representations to make the model inputs correspond to specific locations,so that vulnerability location is achieved.By evaluating fine-grained prediction models for different vulnerability types,this paper concludes that based on a small amount of data,an one-to-one vulnerability prediction model can be constructed to realize one-to-one type identification.Furthermore,based on the existing data of the four types of vulnerabilities,this paper designed multiple sets of experiments to analyze the feasibility of cross-type vulnerability prediction.Then this paper concludes that under the condition of large dataset or enough vulnerability types of training data,the trained models can perform new-type vulnerability prediction and achieve the same effect as existing type prediction. |