Research And Application On Code Annotation Generation Based On Seq2seq

Posted on:2024-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:X D Ren

Full Text:PDF

GTID:2568307106990259

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence-related algorithms,which have made it possible for computers to automatically generate text that conforms to language rules and is semantically fluent,many text generation tasks have mature technical systems.In the process of software development,code comments are very important,which can greatly improve the development efficiency of post-maintenance staff and reduce the maintenance cost of the project.However,developers often spend a lot of time on the specific implementation of logical functions at the beginning of a project,and due to tight project time,a large number of annotation text is missing.Various neural network-based annotation generation models have been proposed by related researchers,but these models still have a lot of room for improvement in terms of accuracy and fluency.The thesis considers code annotation generation as a text translation task and carry out a series of studies on annotation generation for Java method codes based on Sequence-toSequence(Seq2Seq)model,the main contents are as follows:1.Presents an annotation generation model,CSE-GC,based on the Seq2 Seq model.The encoder of this model is composed of a Gate Recurrent Unit(GRU)and Convolutional Neural Networks(CNN),which respectively extract the structural and semantic information of the code language.Additionally,the Abstract Syntax Tree(AST)is an abstract representation of the syntax structure of Java source code,where each node on the tree corresponds to a structure in the source code.To make it easier for encoders to obtain code structure information,this thesis proposes a Code Structure Enhancement(CSE)traversal method for abstract syntax trees.The effectiveness of the annotation generation model and the traversal method CSE is validated through a comprehensive experimental analysis conducted on the same dataset,comparing it with other advanced annotation generation models.The results demonstrate a significant improvement in both BLEU-4 and METEOR metrics for CSE-GC,achieving 45.01% and 30.95% respectively.This indicates the strong performance and effectiveness of CSE-GC in generating annotations.2.The Seq2Seq-based annotation generation model CSE-GC has a fatal drawback that the generated annotated text is often of poor quality and less robust when there is slight interference at the input side.To enhance the robustness and alleviate the problem of sparse dataset,we propose an annotation generation model architecture GAN-CSE-GC that incorporates Generative Adversarial Networks(GAN).Specifically,this thesis uses CSE-GC as the generation model of GAN-CSE-GC and designs a CNN type network as a discriminative model.Additionally,this thesis proposes a code noise data generation method,which involves inputting the constructed noise data and the real data into GANCSE-GC to enable adversarial training of the network.The experimental results demonstrate that GAN-CSE-GC achieves a 0.6% improvement in BLEU-4 and a 1.14%improvement in METEOR compared to CSE-GC when dealing with noisy data.These findings effectively enhance the robustness of CSE-GC in handling such data challenges.

Keywords/Search Tags:

Program Understanding, Code Comment Generation, Sequence-to-Sequence Model, Generative Adversarial Net, Abstract Syntax Tree

PDF Full Text Request

Related items

1	Research On Code Recommendation And Comment Generation With Context Information
2	Research On Code Comment Generation Mettod Based On Neural Network
3	Design And Implementation Of Code Clone Analysis System Based On Sequence Matching
4	Program Semantics Understanding Via Machine Learning
5	Structure-aware Graph Neural Network For Code Comment Generation
6	Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree
7	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
8	Research On Sentiment Dialogue Based On Generative Adversarial Networks
9	Research On Emotional Dialogue Generation Model Based On Deep Learning
10	Pre-training For Program Understanding And Generation