| Software Defect Prediction is a process of improving software reliability assisting developers by identifying the potential bugs in the code.Previous studies applied the traditional methods of predicting software bugs,which mainly considered the manually constructed code features and input into machine learning-based classifiers.Nevertheless,the traditional approaches usually ignore the most meaningful semantic and contextual information of the source code.To address such issues,with this master’s thesis,we present a Software Defect Prediction model employing BiLSTM and BERT-based semantic Feature(SDP-BB)that captures the semantic features of code to predict defects in the corresponding software.The model utilizes the Bidirectional Long-Short Term Memory Network(BiLSTM)to exploit contextual information from the embedded token vectors learned through BERT model.Moreover,the model utilizes an attention mechanism to capture salient features of the nodes.Furthermore,the model employs a data augmentation technique for generating more training data.Finally,we evaluated the performance of full-token and AST-node data processing methods conducting the length of coverage on each project from 50% to 90% in both Within-Project Defect Prediction(WPDP)and Cross-Project Defect Prediction(CPDP)experiments.We evaluate our proposed method on ten open source projects in terms of F1-score in fault prediction.The results indicate that,on average,SDP-BB outperforms the existing state-of the-art models by gaining 6.7% in WPDP and 8.6% in CPDP,respectively.Finally,results indicate that the full-token method is more robust for Software Defect Prediction compared to the AST-node.It is concluded that SDP-BB can extract the semantic of source code more effectively and produce more training,which improves the performance of the Software Defect Prediction task comparing to the existing SDP models. |