| Code summarization is a concise description of the functionality of code in natural language.High-quality code summary information helps developers better understand and maintain code.Code maintenance has always been an important part of the software product life cycle and a difficult problem in software engineering.Therefore,the research on automatic generation of code summarization has important research significance and practical application value.The objective of automatic code summarization research is to establish a mapping relationship between programming languages and natural languages using algorithms and models.Compared with natural languages,programming languages have strong structural,logical,and implicit characteristics,as well as open representation forms.With the advent of deep learning technology,automated code summary models can employ large datasets for self-driving,learning,and comprehension,providing a promising solution for addressing the differences between these two languages.The current models for code summarization fail to exploit the structural and semantic information extracted from Abstract Syntax Trees,especially in the context of mining multiple features from the code.Additionally,existing automatic code models ignore the characteristics of multiple-source information in the design of their encoding and decoding structures.As such,this paper aims to address these two issues.(1)In terms of code data mining,we explore multiple features of code,including lexical,syntactical,syntactic structure,and semantic characteristics,and extract and apply them to code summarization.To address the challenges of representing structural information from AST,we propose a triple encoding method that includes absolute and relative positions to mark the relative position relationships between nodes.To address the challenge of semantic mining,we propose a method that extends the edge relationships between AST nodes to explicitly capture the implicit control flow and data flow information at the syntax level,thereby achieving the transformation from the AST to the semantic graph.(2)In terms of model design,we present a novel automatic code summarization model named MF-Code Sum,based on the Transformer architecture.To address the issue of incomplete utilization of multiple-source feature information,we propose a number of optimizations,including the use of an additional graph convolutional neural network for encoding graph information at the encoding stage,as well as an improved attention-based multi-feature decoder named d Decoder at the decoding stage.In addition,we introduce modifications to the PNG to create a MPG network.As the generator for producing code summaries,the MPG is able to accept inputs from multiple sources,enabling an expansion in the vocabulary from the summary vocabulary to include those belonging to the lexical,syntactical and summary vocabulary categories,thus generating more accurate summary descriptions.The proposed MF-Code Sum model achieves significant improvements in code summarization performance,by leveraging information from multiple feature sources more efficiently.(3)The performance of MF-Code Sum was evaluated on two large-scale,opensource Java datasets.Experiments are conducted the results show that MF-Code Sum outperformed the comparative models in all three evaluation metrics,i.e.,showing a respective increase of 1.88,0.49,and 2.84 on the Java-A dataset.The BLEU-4 and METEOR indices also showed a significant improvement of 1.24 and 1.76 respectively on the Java-B dataset,indicating that MF-Code Sum had superior code understanding and summarization capabilities.These experimental results confirmed the effectiveness of the proposed model. |