| Software defect prediction techniques use data mining and machine learning technologies to analyze and predict software defects.Their purpose is to provide guidance to testing personnel and clarify the key testing directions for software.Existing software defect prediction methods are mostly based on statistical metrics,and the input data required for these methods require a lot of time to construct.Moreover,the input data ignores the semantic information of samples,which leads to long construction times and poor prediction performance of the prediction model.In recent years,researchers have begun to explore how to use code semantics to predict software defects.This method can automatically extract semantic features from the source code of samples,eliminating the process of manually constructing statistical metrics,and improving software defect prediction performance.Despite this,current software defect prediction methods based on code semantics also suffer from problems such as incomplete extraction of semantic information and insufficient utilization of semantic features.Based on these issues,this thesis proposes corresponding solutions.For the problem of incomplete extraction of semantic information,this thesis proposes a semantic information extraction method based on abstract syntax trees.This method improves the traversal method of abstract syntax trees,reducing the occurrence of situations where different code extractions have the same semantic information.For the problem of insufficient utilization of semantic features,this thesis proposes a contrastive feature generation network.The network mainly consists of two steps: step one generates differential features by sufficiently comparing the semantic feature differences between different samples,and step two inputs these differential features into an attention-based network to classify samples with different labels into different representation spaces.Based on the above two improvement schemes,this thesis proposes a software defect prediction method based on code semantics(BILC-Attention).This method first obtains semantic information by traversing the abstract syntax tree,extracts semantic features from the semantic information,enlarges the differences between semantic features through the contrastive feature generation network,and finally classifies samples through the classification network.To fully utilize existing data,this thesis proposes a software defect prediction method based on fusion features(FFLM-Attention)on the basis of BILC-Attention.This method improves the software defect prediction model in the BILC-Attention method,realizing the prediction of software defects based on both statistical metrics and code semantics.To verify the effectiveness of the two defect prediction methods,this thesis conducted multiple experiments using seven open-source projects from the PROMISE dataset.To ensure the accuracy of the experiments,each group of experiments was repeated 10 times.The results show that the BILC-Attention method has a 2.05%improvement in the average F1 value compared to the optimal benchmark method,and the FFLM-Attention method has a 10.28% improvement in the average F1 value compared to the BILC-Attention method,fully demonstrating the effectiveness of the two proposed methods.To solve the problem of difficult implementation of software defect prediction methods,this thesis designs and implements a software defect prediction system.The system is divided into six modules according to the user’s actual needs,namely the homepage module,project management module,word embedding model management module,algorithm management module,prediction model management module,and defect prediction management module.The homepage module provides guidance for users to use the system,while the other modules are used to complete the software defect prediction function.By dividing the modules,the system can cover a wide range of user needs and can be applied to practical software testing work. |