| In the process of software development,due to the complexity of the software system and the changing needs,software defects are difficult to avoid,which seriously affects the quality of the software.Therefore,it becomes critical to fix defects in a timely manner before the software is released.In order to improve software quality and save software repair time and cost,researchers continue to explore automatic program repair technology.Templates-based and machine-learning-based automated remediation methods have made significant progress in recent years,but there are still some issues with these methods.For example,the automatic mining template repair method has problems such as low template mining ability,resulting in poor template quality,while the machine learning repair method has problems such as a large proportion of uncompiled patches in the generated patch space and inaccurate correct patch ordering.Therefore,this dissertation will focus on the above problems to carry out the research of automatic program repair methods,the main work is as follows:(1)Aiming at the problem of poor template quality caused by low template mining ability in template-based software automatic repair method,this dissertation proposes an automatic program repair method GTML based on machine learning.Firstly,the leaf label refinement algorithm and the context extraction algorithm are designed to improve the representation and mining accuracy of defect information.Secondly,GTML applies machine learning and clustering techniques to defect clustering,solves the problem of false association and generalization between defect use cases,and improves the accuracy of template mining.Finally,aiming at the problem of loss of template generalization information,GTML integrates statistical information generalization and hierarchical generalization,and combines the information of defect use cases to solve the problem of overgeneralization and improve the template mining ability.This dissertation proves through experiments that GTML has certain template mining ability and high defect repair performance in software automatic repair tasks.(2)Aiming at the problem of high proportion of uncompiled patches and inaccurate correct patch ordering in the patch space generated by the software automatic repair method based on machine learning,this dissertation proposes an automatic program repair method INAPR based on identifier-aware NMT model.First,byte-pair encoding techniques are used to mark compound and rare words,generate a smaller but more accurate token set,and solve the problem of difficult or even missing correct token search during patch generation to reduce the number of uncompiled patches.Secondly,aiming at the problem of low proportion of compilable patches in the NMT model,the T5 framework in the natural language processing task is used to train the defect repair model combined with the new identifier awareness task,and the defect repair model and the Co Nu T model are fused in the fine-tuning stage to build a defect repair model with higher patch accuracy.Finally,a token-aware beam finding strategy is proposed,and the model selects tokens with high scores when generating patches,thereby improving the ranking of high-correctness patches and reducing non-compilable patches.This dissertation uses experiments to prove that INAPR can effectively increase the number of correct patches in the candidate patch space and improve the defect repair performance in the software automatic repair task. |