Font Size: a A A

From Code To Natural Language: Type-aware Sketch-based Seq2seq Learning

Posted on:2022-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y H DengFull Text:PDF
GTID:2518306500450554Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of informatization and digitization,all walks of life in the society continue to generate a large number of codes to drive the normal and efficient operation of business,such as quantitative transactions in the financial field,statistical analysis system of financial statements,interconnection of various weapons and equipment in the military field,combat command information system,etc.Various information systems play an increasingly impally lacks sufficient comments during the development process.As time goes by,in the face of complex systems,maintainers and developers must first spend a lot of energy to understand the meaning of the code.Therefore,automatically generating comments for code can save energy and improve efficiency for maintenance and developers.At present,the traditional method of code annotation generation mainly relies on professional personnel to write rules and templates,and then generate annotations according to the rules,or directly encode the source code and input it into the neural network to generate annotations directly.Both methods have some disadvantages.The former requires a lot of manpower to write rules,and these rules often fail in some cases,and the latter is faced with the problem that the generated content is sometimes meaningless,such as generating invalid repetitive phrases.To solve these problems,this paper studies how to combine the advantages of these two methods,and use certain rules to guide the generation of neural network,so as to ensure that the generated annotation is meaningful and smooth.Therefore,this paper designs a two-layer neural network model,the first layer is mainly used to generate natural language templates,the template vocabulary only contains the most commonly used words and some types of words,to ensure that it is simple enough,small vocabulary,so that the first layer model can easily learn the rules of generating templates.The input of the first layer is the feature vector which is composed of the word vector of the source code and its type and position vector.A layer of bidirectional recurrent neural network is used for encoding,and then LSTM is used for decoding to obtain the template.The second layer is used to generate the final natural language annotation,its input is the template and code feature vector generated by the first layer,and then the another layer of bidirectional recurrent neural network is used for encoding.Finally,the LSTM network with attention mechanism and replication mechanism is used as the decoder to output the natural language annotation of the code.After building the model,we conducted an experiment on a python data set,our model in the three mainstream indicators,BLEU,METEOR and ROUGE-L all take a better effect,and compared with other methods in the same period,it has a certain improvement.At the same time,we conducted a careful analysis of the model and explored the influence of each module on the final result.The results show that the template-guided structure we proposed has the greatest contribution to the final result.
Keywords/Search Tags:comments generation, recurrent neural network, natural language processing, template guidance, deep learning
PDF Full Text Request
Related items