| Software aging,which is caused by the accumulation of aging-related bugs,refers to phenomenon of system performance decline and even system crash in the long-term running system.In order to analyze and predict aging-related bugs,the software aging-related bugs prediction technology uses the information contained in the code to determine whether there are aging-related bugs in software.It can help developers and testers find modules that are more prone to contain aging-related bugs,so that they can refactor code more effectively and avoid potential aging-related bugs from the source.In order to predict aging-related bugs,previous researchers usually focused on manually designed features extracted from programs and used different machine learning methods to detect defective codes.However,these traditional features usually cannot distinguish the semantic differences between programs.What’s more,there are generally fewer files containing aging-related bugs compared with files that do not contain aging-related bugs,which is a serious class imbalance problem within the project.As a result,it is difficult to obtain sufficient training samples of aging-related bugs.To address these issues,this thesis proposes an approach based on semantics features and fuzzy oversampling.First,it uses convolutional neural network to automatically learn semantic features of programs from the abstract syntax trees(AST),and then uses fuzzy oversampling methods to alleviate the class imbalance problem within the project,so as to achieve the purpose of effectively predicting aging-related bugs.The main work of this thesis is as follows:1)Based on abstract syntax trees,deep learning is adopted to learn semantic features from program source codes.As a tree-like data structure of program’s source code,ASTs can well describe the context relationship between program,and can be used to effectively analyze and predict aging-related bugs.We first extract the abstract syntax trees from the program source codes,and convert them into token vectors by traversing.Then use the word embedding technique to encode the token vectors into numerical vectors and input them into the convolutional neural network.Finally,the convolutional neural network will automatically learn the semantic features of the program source codes.Experimental results show that the aging-related bug prediction method based on program semantic features improves the Bal value on Linux and My SQL dataset by 1% and 4.7%,compared with the traditional manually designed features-based approach.2)A fuzzy based over-sampling algorithm is used to synthesize minority samples from the observed data to alleviate the class imbalance within the project.In real-world datasets,minority samples(defective samples)often account for a small proportion of the dataset.Since the model lacks sufficient training data,the prediction performance of the model is not ideal.However,the generally used random oversampling method only randomly selects ARB samples repeatedly to increase the training samples,which increases the risk of model overfitting.In this thesis,a semantics features and fuzzy oversampling-based aging-related bugs prediction model is proposed by combining the semantic features extracted from the program source codes and the fuzzy oversampling method.Firstly,the convolutional neural network is used to learn and generate the semantic features of the program,and then the learned semantic features are combined with traditional aging-related features.On the hybrid features,the fuzzy oversampling method is used to synthesize minority samples to obtain sufficient training data.Finally,on the balanced dataset,various machine learning algorithms are applied to predict aging-related bugs.The experimental results indicate that our semantics features and fuzzy oversampling-based aging-related bugs prediction model has improved the Bal value by 29.2% and 20.2% on average,compared with traditional features and oversampling-based method on Linux and My SQL dataset,and improved the Bal value by 5.0% and 11.1% on average compared with ensemble model. |