| With the development of big data and the accumulation of a large amount of knowl-edge in data,software systems have gradually transformed from informatization to in-telligence,such as intelligent software engineering.Source code comprehension can be used for many intelligent software engineering tasks,including code classification,de-fect detection,clone detection,and code retrieval.However,the existing methods for comprehension cannot capture the code semantic from the literal aspect completely,and are complex and not robust for code syntax.It lows or even ignores the code syn-tax extraction.Futher more,the methods based on abstract syntax tree are disturbed by numerous noises,which significantly reduces the performance of code comprehension.Objectively,source code comprehension is a procedure to make a text can ex-press its function.And we can find a mapping,to represent the code as a fixed,low-dimensional and dense vector,which can be used to measure the similarity between codes and texts in the semantic space with the vector space model.We first convert the source code will into AST that independent with programming language,then construct a sequence of path pairs based on our definitions and algorithms to obtain features con-sist of syntax and semantics.In this thesis,we proposed a hybrid encoder,that is,a sub-token encoder for semantic,and a path pair encoder for syntax.Among them,the sub-token encoder encodes text fragments in the source code to a vector,which strength-ens the semantic features.And we adopt a simpler static path encoder to handle the syntax of the code,which leads to a more robust and accurate syntax comprehension.In particular,we also proposed a dynamic path fusion method named Self-attention based Path Fusion,i.e.SPF.So that,the syntax features fusion can be more effective,the noise in the existing AST encoding method can be greatly reduced,the accuracy of code comprehension can be improved,the problem scale can be simplified,and the code encoding can be more efficient.We conduct two tasks: 1)method name generation and 2)semantic matching between code and text to test our methods.The experimental results show that all our methods obviously outperform the benchmark experiment on both tasks.Relied on the robustness and efficiency of RNN,the sequential features between AST nodes are more precise,which improved syntax feature extraction.The proposed dynamic SPF method attached to the RNN network suppressed the interference of noise in data and obtained a high-quality fused feature automatically.Finally,compared with baseline,both two tasks improved the metrics about 20% and 60%.In addition,based on the variant of loss and API corpora,we further optimized the proposed SPF to a better code comprehension method. |