Font Size: a A A

Research On Automatic Code Summarization Technology Based On Enhanced Multi-modal Representation

Posted on:2024-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z MaFull Text:PDF
GTID:2568307058982079Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the software is becoming larger and more complex in scale,and with it,the pressure on program developers is increasing.In the process of program development and maintenance,program developers often spend a lot of time and effort in understanding the code functions.Therefore,in order to relieve the pressure on developers and improve the efficiency of software development,automatic code summarization technology was born.Automatic code summarization techniques can automatically generate corresponding natural language descriptions based on the input code to express the functions implemented in the source code,helping software developers to understand the source code correctly and quickly,thus improving the efficiency of developers.Currently,automatic code summarization techniques are divided into three main categories: template-based,information retrieval-based,and deep learningbased.Among them,the template-based automatic code summarization method mainly generates code annotations based on some hand-made templates created,and this method requires that the function names and custom identifier names of the code must be standardized,otherwise,the quality of generated summarization is affected;the information retrieval-based automatic code summarization method mainly applies the vector space model to search for summarization from similar code fragments,and the quality of summarization generated by this method is affected by the similar code;the deep learning-based automatic code summarization technique mainly uses deep learning techniques to train the model,and the trained model has the ability to generate abstracts for the source code,and this method can take into account the structural information of the source code and has scalability.In recent years,the research of automatic code summarization techniques based on deep learning has become a mainstream research direction.However,the current deep learning-based automatic code summarization techniques also have the following problems:(1)there is a fine-grained(e.g.,lexical semantics and node properties)association between the word sequence(Token)modality and the abstract syntax tree(AST)modality of the code,and fusing the two modalities through this fine-grained association will provide more hints for summarization generation,which is not considered by the existing techniques.association between the two modalities.(2)The application of AST modal information for codes needs to be efficient while retaining structural information,current techniques to directly encode models of ASTs with tree-based neural networks lead to long training times and gradient disappearance problems,in addition to the linearization method of flattening ASTs into sequences,which in essence again loses the hierarchical information of the AST.(3)Information about the grammar rules of the AST modality of the code is very important for summarization generation,and existing techniques rarely consider such information in this domain.In order to solve the above problems,this thesis proposes an automatic code summarization technique based on multimodal characterization enhancement,and the research includes the following three main aspects.(1)To address the problem of ignoring the fine-grained association between the Token modality of the code and the AST modality,this thesis proposes an automatic code summarization method based on multi-modal fine-grained feature fusion,which will fuse the Token modality of the code with the AST modality at a fine-grained level to characterize the information of the code more comprehensively.The fine-grained feature matching module will match the vectors of the Token modality of the code with the AST modality in a fine-grained manner,and then the finegrained feature fusion module will fuse the two modalities with the matched information.(2)To address the problem that the current application of AST modal information of codes cannot simultaneously take into account the structural information and coding efficiency,this thesis proposes an automatic multi-modal code summarization method based on AST modularity feature enhancement,which modularly splits the AST modal of codes and then uses a level-bylevel aggregation method to effectively improve the coding efficiency of AST,and retains the structural information of AST,reduces the problem of gradient disappearance,and can effectively improve the robustness of the generated summarization.(3)To address the problem that AST modal grammar rules for codes have failed to be considered,this thesis proposes to apply AST modal grammar rules to automatic code summarization.This can further obtain information about the AST,characterize the code more comprehensively,and provide more hints for summarization generation.
Keywords/Search Tags:Automatic code summarization, Fine-grained fusion, Modular split, Level-by-level aggregation, Grammar rules
PDF Full Text Request
Related items