Prediction Of Code Function Names Based On Natural Language Processing

Posted on:2021-09-09

Degree:Master

Type:Thesis

Country:China

Candidate:X Hu

Full Text:PDF

GTID:2518306548481814

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Function name prediction is an important downstream task in code analysis.An excellent function name can increase the intelligibility of a program or code and help developers easily understand the code of others,which is essential for the expansion and maintenance of software products.In recent years,researchers have proposed a large number of different function name prediction models.With the development of machine learning,function name prediction methods have gradually changed from traditional code analysis to deep learning code representation.Various machine learning based function names predicting tools are endless.However,there are still two problems in using machine learning models to complete function name prediction tasks:First,the function name prediction task that spans different projects cannot be completed well.The common method can only complete the prediction task under the same project;the second is the limitation of the function name library,which leads to a low accuracy of prediction by various methods.Therefore,this topic proposes a function name prediction model that spans different projects based on a large functional corpus.In this topic,we first proposed a method for extracting large-scale functional corpus from the Git open source repository.Through the function extraction tool we designed,we extracted all the functions in the open source projects that meet the conditions,and then after data cleaning and function filtering,we built a large function corpus.Then we use the Skip-gram model of natural language processing direction to complete the vector pre-training task of code Token.In order to express the code well with vector,which is code2 vec task,we proposed the AttBiLSTM model based on the function name supervision training code vector.At the same time,in order to accelerate model training and improve the accuracy of prediction,we use the TF-IDF algorithm to analyze the key tokens in the code,and establish a set of candidate function names for different tokens,we improve both the model efficiency and the prediction accuracy.Finally,we conducted a full experimental comparison based on the extracted largescale functional corpus.Experimental results show that in the function name prediction task,our prediction method is superior to other advanced methods in terms of model efficiency and prediction accuracy.

Keywords/Search Tags:

Function name prediction, Machine learning, TF-IDF algorithm, Code representation, AttBiLSTM model

PDF Full Text Request

Related items

1	Function Representation Learning By Leveraging Both Source Code And Binary Code
2	Research On Key Techniques Of Text Representation Learning For Stock Market Prediction
3	Research On Software Defect Prediction Based On Code Representation
4	Research On Representation Learning And Prediction Model Of Crowd Movement Trajectory Based On Deep Learning
5	Research On Knowledge Graph Representation Learning And Its Application In Stock Price Prediction
6	Research On Recognizing Functions In Binary Code Of ARM Platform Based On Machine Learning
7	Research On Tensor Learning Algorithm And Its Application On Disease Prediction
8	Research On Machine Learning Based Software Vulnerability Detection And Optimization Technologies
9	Local prediction and classification techniques for machine learning and data mining
10	Research And Application Of Key Process Parameters Prediction Model Based On Machine Learning