| Software defect prediction technology can find software defects as early as possible in the early stage of software development,and conducive to reasonably allocate test resources,and reduce software development and maintenance costs,which is one of the important methods to ensure software reliability.The development of machine learning has provided new ideas for software defect prediction.This paper conducted a study on software defect prediction methods based on machine learning.The main contents are as follows.Firstly,different types of software defects and software defect feature extraction methods are studied.Combined with feature selection,class imbalance learning,and word vector technique,machine learning methods are used to improve the quality of software defect datasets and improve the performance of software defect prediction models.Secondly,a feature selection algorithm based on cost sensitivity is proposed for class unbalance and dimension explosion in software defect prediction.This method considers the class imbalance problem in the feature selection stage,and first uses the weighted Gini index to construct a cost-sensitive extrame tree,and obtains the feature importance score and eliminates the irrelevant features,and then uses the SBFS algorithm to analyze the feature redundancy and eliminate the redundant features to obtain the most optimal feature subset.On this basis,the random forest algorithm is used to further solve the class imbalance problem,and the performance of the software prediction model is improved.Thirdly,software vulnerability is a special type of software defect.The software measurement features commonly used in software defect prediction lack semantic information and cannot effectively characterize vulnerabilities.This paper proposes a vulnerability prediction model based on AT-BGRU and KNN to achieve vulnerability prediction at the program slice level.By designing the code vectorization method to convert codes into integer vectors.Constructing the AT-BGRU model,the BGRU model is used to learn semantic information such as function call relationships,and combine the attention mechanism to give high weight to keywords to reduce the impact of irrelevant information.Use KNN algorithm and code similarity to predict whether there are vulnerabilities in the code.Finally,the effect of the cost-sensitive feature selection algorithm on class imbalance and dimensional explosion is verified on the MDP dataset.The effect of vulnerability prediction model based on AT-BGRU and KNN in software vulnerability prediction is verified on the CGD dataset. |