Font Size: a A A

Program Type Inference Based On Pre-trained Mask Language Model

Posted on:2024-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q YuanFull Text:PDF
GTID:2568307112976799Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Code snippets often involve undeclared receiver objects and non-fully qualified names.Resolving the undeclared received objects and non-fully qualified names into the corresponding fully qualified names(i.e.,type inference)is a prerequisite for efficiently using the knowledge in code snippets.To infer the Fully qualified name(FQN)in the code snippet,existing work is based on the symbol knowledge base and adopts the keyword matching "dictionary lookup" strategy.However,constructing a symbolic knowledge base depends on parsing compilable code files.This compilable overhead limits the number of fully qualified names and code contexts stored in the symbolic knowledge base.When type inference is made by using keyword-matching strategies,the out-of-of-Vocabulary problem occurs due to the limited knowledge in the symbolic knowledge base.This means looking for the full data type and returning a null value if it is not stored in the symbolic knowledge base.To solve the out-of-of-vocabulary problem in type inference task,this thesis adopts prompt tuning to activate the pre-training language model as the neural knowledge base of type inference(type reasoning model),and adopts the fill-in-blank strategy for type inference.Compared with the construction of symbolic knowledge base,there is no compilable overhead in the construction of neural knowledge base because the code is treated as text based on the naturalness of the code.Based on the neural knowledge base of activated type inference,two types of type inference model carrier plug-ins are designed in this thesis,namely Integrated development environments(IDE)plug-in and WEB plug-in.In the experimental section,we systematically evaluate the proposed type inference model from three perspectives: effectiveness,practicability,and capability exploration.Results from the effectiveness experiment demonstrate that the model has low-resource learning ability,achieving excellent type reasoning performance with only 10% of the data needed for prompt-tuning the mask language model.The practical experiment results show that the performance of the type inference model surpasses that of the latest type inference tools and effectively handles out-ofvocabulary problems in existing work.Moreover,the capability exploration experiment results demonstrate the model’s generalization capability(across different programming languages)and its mixed-Language capability(provide a uniform type inference model for different programming languages).
Keywords/Search Tags:Code snippet, Type Inference, Fully-qualified name, Pre-trained masked language model, Prompt tuning
PDF Full Text Request
Related items