Research On Multi-Feature Code Summarization Based On Deep Learning

Posted on:2024-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Wang

Full Text:PDF

GTID:2568307064485744

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Code summarization is a concise description of the functionality of code in natural language.High-quality code summary information helps developers better understand and maintain code.Code maintenance has always been an important part of the software product life cycle and a difficult problem in software engineering.Therefore,the research on automatic generation of code summarization has important research significance and practical application value.The objective of automatic code summarization research is to establish a mapping relationship between programming languages and natural languages using algorithms and models.Compared with natural languages,programming languages have strong structural,logical,and implicit characteristics,as well as open representation forms.With the advent of deep learning technology,automated code summary models can employ large datasets for self-driving,learning,and comprehension,providing a promising solution for addressing the differences between these two languages.The current models for code summarization fail to exploit the structural and semantic information extracted from Abstract Syntax Trees,especially in the context of mining multiple features from the code.Additionally,existing automatic code models ignore the characteristics of multiple-source information in the design of their encoding and decoding structures.As such,this paper aims to address these two issues.(1)In terms of code data mining,we explore multiple features of code,including lexical,syntactical,syntactic structure,and semantic characteristics,and extract and apply them to code summarization.To address the challenges of representing structural information from AST,we propose a triple encoding method that includes absolute and relative positions to mark the relative position relationships between nodes.To address the challenge of semantic mining,we propose a method that extends the edge relationships between AST nodes to explicitly capture the implicit control flow and data flow information at the syntax level,thereby achieving the transformation from the AST to the semantic graph.(2)In terms of model design,we present a novel automatic code summarization model named MF-Code Sum,based on the Transformer architecture.To address the issue of incomplete utilization of multiple-source feature information,we propose a number of optimizations,including the use of an additional graph convolutional neural network for encoding graph information at the encoding stage,as well as an improved attention-based multi-feature decoder named d Decoder at the decoding stage.In addition,we introduce modifications to the PNG to create a MPG network.As the generator for producing code summaries,the MPG is able to accept inputs from multiple sources,enabling an expansion in the vocabulary from the summary vocabulary to include those belonging to the lexical,syntactical and summary vocabulary categories,thus generating more accurate summary descriptions.The proposed MF-Code Sum model achieves significant improvements in code summarization performance,by leveraging information from multiple feature sources more efficiently.(3)The performance of MF-Code Sum was evaluated on two large-scale,opensource Java datasets.Experiments are conducted the results show that MF-Code Sum outperformed the comparative models in all three evaluation metrics,i.e.,showing a respective increase of 1.88,0.49,and 2.84 on the Java-A dataset.The BLEU-4 and METEOR indices also showed a significant improvement of 1.24 and 1.76 respectively on the Java-B dataset,indicating that MF-Code Sum had superior code understanding and summarization capabilities.These experimental results confirmed the effectiveness of the proposed model.

Keywords/Search Tags:

Code Summarization, Deep Learning, Multi-Feature, Abstract Syntax Tree, Transformer, Pointer Generator Network

PDF Full Text Request

Related items

1	Research On Multi-scale Multi-modal Source Code Summarization Technology Based On Program Feature Enhancement
2	Research On Code Summarization Generation Method Based On Deep Learning
3	Research Of News Text Summarization Based On Deep Learning
4	Research On Code Annotation Generation Method Based On Seq2seq Framework
5	Design And Implementation Of Abstract Syntax Tree Based Code Defect Detection
6	Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree
7	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
8	Research On Chinese Automatic Text Summarization Based On Generative Deep Neural Networks
9	Research On Code Clone Detection Based On Deep Learning
10	Research And Implementation Of Code Clone Detection Technology Based On Deep Learning