Font Size: a A A

Specification Mining Based On Deep Learning

Posted on:2022-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z CaoFull Text:PDF
GTID:2518306605989359Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Any activity of system engineering needs a specification,which is the description of a goal of the system.In software engineering,Specification is also very important.Mature soft-ware specifications can make people understand software systems better,thus saving a lot of maintenance cost.Program specification can assist people in program debugging,and even developing validation tools on it to automatically check program errors,so as to increase the correctness of a system.Recently,although many automated technologies have been proposed for Specification Mining,it still requires further improvement in the accuracy of mining.The purpose of this thesis is to use deep learning to mine specifications from method call sequences,and optimize the latest Specification Mining Framework.To this end,the static scan algorithm proposed in this thesis is used to generate test cases,and the test cases are used to generate a trace set covering program execution paths as many as possible,which is utilized to train the language model based on Recurrent Neural Network.Then,a subset representing all traces samples is selected from the total trace set.Further,a Prefix Tree Acceptor is built with the subset,and the deep learning model is used to extract features.Moreover,these features are used to combine similar automata states in Prefix Tree Ac-ceptor to build multiple Finite State Automatons.The quality of the model is evaluated by calculating the Fmeasureof each automaton.The main work is as follows:(1)To solve the problem that Daikon tool does not support Python and can not generate multiple program traces at a time,a Python-based program traces extraction tool Py Tracer is developed.Py Tracer is a Debug-like program trace extraction tool based on decorator principle and program dynamic execution process.Py Tracer implements support for Python,generation of multiple program traces at a time,extraction of parameter information during program running,and encapsulation of key information extraction.(2)To solve the problem of low coverage of test cases generated by random method,a test case generation algorithm based on static scan is proposed.First,we scan the source code,and save all parametric inequalities containing if judgment statements in a group of sequential inequalities.Then the parametric inequalities in the inequality group are inverted and combined to form a new inequality group.Finally,the Z3 is used to solve the inequality group to get the parameter cases.Experiments show that on the basis of generating the same number of test cases,it generated by static scan algorithm can improve the coverage of program space compared with random method.(3)To solve the distraction problem of statistical language models based on Recurrent Neu-ral Networks(RNN),a specification mining framework is proposed,which combines the attention mechanism with the model fusion.The specifications mined by the new frame-work have higher accuracy.The original statistical language model lacks focus and has uneven attention probability distribution in encoding previous sequence information.The new language model adds an attention mechanism to the outer layer of the Long Short-Term Memory(LSTM),which is used to recode the probability distribution of attention on the output of the RNN within the lookup step.The adjusted distribution of attention can provide a more focused understanding of the process of the program.In order to further improve the learning ability of the model,Model fusion improves the accuracy of the model’s next token prediction and allows you to learn more complex sequence call relationships.Experiments show that this method improves the results of the original paper by nearly 20 percentage points.This shows that the framework based on attention mechanism and model fusion can better understand method invocation relationships in programs.
Keywords/Search Tags:Specification Mining, Deep Learning, Attention, Model Fusion, Trace Extraction
PDF Full Text Request
Related items