Font Size: a A A

Research And Implementation Of Automated Program Repair Method Based On Pre-trained Model

Posted on:2024-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2568306914988239Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Automatic program repair is a key technology to improve the efficiency of software defect repair,reduce the difficulty of defect repair and reduce the cost of software maintenance,and it can also free software developers from the complex and tedious work of software operation and maintenance and program defect repair.In recent years,with the in-depth research of deep learning and neural networks,researchers in the field of automatic program repair have gradually shifted their attention from traditional defect repair techniques to automatic program repair methods driven by deep learning techniques.The current mainstream deep learning-based approaches use neural machine translation models to translate defective code into correct code to obtain defective code patches.Although such approaches are able to capture the complex relationships corresponding to the defective code and the correct code and automatically learn the abstract repair templates in the program from historical defect repair data,when the sequence of code input to the model is too long,it is difficult for the model to learn the syntactic,semantic,and structural information between the defective code statements and other statements around them,resulting in lower-than-expected repair results.The characteristics of programming languages,such as long-range semantic dependencies and free variable naming style,lead to inefficiency of serialized characterization methods for learning features,which likewise directly affects the effectiveness of the model in learning code features.Therefore,in this paper,two automatic program repair methods based on pre-trained models are proposed to address some of the above-mentioned limitations in existing methods.The main work of this paper is as follows:1)A BERT-based approach to enhance automatic program repair with contextual semantic information is proposed.In order to better represent defective code,the method uses a pre-trained model BERT to represent source code as sequences,and adds code summaries to the input data of the model as the context of the source code to enhance the learning ability of the model for code semantic information,and also introduces an attention-based Transformer model for generating defective code patches.Empirical evaluation on Defects4J,currently the most popular benchmark test set,shows that the approach can generate compilable patches for 63 defects,of which 41 are correct,outperforming the baseline approach.2)An automatic repair method for information enhancement procedures based on GraphCodeBert is proposed.The method adds data flow graph information to the input data of the model with the code summary already added,and uses GraphCodeBert to represent the multi-source combined input as a sequence.Therefore,when the model performs feature learning,it not only learns more deep semantic information in the code,but also enhances the learning ability for code structure information.Empirical evaluation on Defects4J,currently the most popular benchmarking set,shows that the approach is able to generate compilable patches for 69 defects,of which 46 are correct,outperforming the baseline approach.3)An automatic program repair system has been developed.This system can automatically analyze program defects,which helps software developers or users to quickly understand the defects in the program and help them to finish the subsequent program defect repair work.
Keywords/Search Tags:Automatic program repair, Neural machine translation, Code representation, Pre-trained models
PDF Full Text Request
Related items