Program Type Inference Based On Pre-trained Mask Language Model

Posted on:2024-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Yuan

Full Text:PDF

GTID:2568307112976799

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Code snippets often involve undeclared receiver objects and non-fully qualified names.Resolving the undeclared received objects and non-fully qualified names into the corresponding fully qualified names(i.e.,type inference)is a prerequisite for efficiently using the knowledge in code snippets.To infer the Fully qualified name(FQN)in the code snippet,existing work is based on the symbol knowledge base and adopts the keyword matching "dictionary lookup" strategy.However,constructing a symbolic knowledge base depends on parsing compilable code files.This compilable overhead limits the number of fully qualified names and code contexts stored in the symbolic knowledge base.When type inference is made by using keyword-matching strategies,the out-of-of-Vocabulary problem occurs due to the limited knowledge in the symbolic knowledge base.This means looking for the full data type and returning a null value if it is not stored in the symbolic knowledge base.To solve the out-of-of-vocabulary problem in type inference task,this thesis adopts prompt tuning to activate the pre-training language model as the neural knowledge base of type inference(type reasoning model),and adopts the fill-in-blank strategy for type inference.Compared with the construction of symbolic knowledge base,there is no compilable overhead in the construction of neural knowledge base because the code is treated as text based on the naturalness of the code.Based on the neural knowledge base of activated type inference,two types of type inference model carrier plug-ins are designed in this thesis,namely Integrated development environments(IDE)plug-in and WEB plug-in.In the experimental section,we systematically evaluate the proposed type inference model from three perspectives: effectiveness,practicability,and capability exploration.Results from the effectiveness experiment demonstrate that the model has low-resource learning ability,achieving excellent type reasoning performance with only 10% of the data needed for prompt-tuning the mask language model.The practical experiment results show that the performance of the type inference model surpasses that of the latest type inference tools and effectively handles out-ofvocabulary problems in existing work.Moreover,the capability exploration experiment results demonstrate the model’s generalization capability(across different programming languages)and its mixed-Language capability(provide a uniform type inference model for different programming languages).

Keywords/Search Tags:

Code snippet, Type Inference, Fully-qualified name, Pre-trained masked language model, Prompt tuning

PDF Full Text Request

Related items

1	Research On Inference Acceleration Of Pre-trained Language Model Based On Early Exit
2	Vulnerability Text And Code Assessment Based On Pre-trained Models And Prompt Learning
3	Research On Event Extraction Technique Based On Pre-trained Language Model
4	Multi-Linguistic Acronym Disambiguation Based On Crossed Pre-trained Language Model
5	Research On Code Snippet Recommendation Method Based On Code Statement Granularity Representation
6	Research On Abstractive Text Summarization Based On Pre-trained Language Model
7	Research On Dialogue Summarization Technology Based On Pre-Trained Language Models
8	The Research And Implementation Of Cross-domain Slot Filling Method Based On Pre-trained Model
9	Research And Application Of Few-shot Text Classification Based On Prompt Learning
10	Research On Document Level Event Extraction Method Based On Prompt Learning