| Even under the guidance of pseudo code,generating executable programs composed of dozens of lines of code is still a challenging task.Therefore,the executable program generation task is usually divided into two stages.In the candidate code generation stage,the existing work uses sequence to sequence model to generate candidate code sets for each line of pseudo code.However,the traditional model has some shortcomings in modeling pseudo code and code,for example,it can’t effectively represent the unknown words composed of identifier,and it ignores the semantic and syntactic structure of code.In the code search stage,the existing work can only use the compiler’s error information to locate the error code,and cannot repair it.Moreover,the context of error code and error information are handled separately,which ignores the semantic relationship between them.The main contributions of this paper are as followsFirst,in the candidate code generation stage,the transformer-based sequence to sequence model is improved and a multi granularity generator is proposed.In fine granularity,the unknown identifier in pseudo code is decomposed into subword sequences.In coarse granularity,the identifier is kept intact,so that the identifier can be directly copied into the candidate code.The output target of decoder is changed from symbol sequence to abstract syntax tree of candidate code,and the topological relationship between nodes is modeled by structure position coding.Compared with the symbol sequence,the abstract syntax tree introduces extra non terminal nodes,so we propose a module extraction algorithm to compress the abstract syntax tree,in order to shorten the decoding path and reduce the decoding difficulty.Second,in the code search stage,we propose to locate and fix the candidate code that causes compilation errors at the same time.In order to avoid generating the repair version of error code from scratch,an edit-repair model is proposed,which transforms the repair process into the generation of repair script.The model builds the semantic correspondence between the context of the error code and the error information fed back by the compiler through the graph attention network.In addition,the model also introduces the conditional variational autoencoder,which generates different repair scripts by sampling operation for the search algorithm to test.In the experimental part,whether the program can run successfully or not is used to judge the functional correctness of the program.The test results show that the multi granularity generator can improve the quality of candidate code.At the same time,in the case of using the same candidate code,compared with the existing algorithms,the search algorithm based on editrepair model can find more executable programs with correct functions with less attempts. |