Font Size: a A A

Research Of Code Clone Detection Algorithm Based On Program Dependence Graph

Posted on:2019-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2428330545452506Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently,detecting code clones in software systems is becoming a more and more important research topic in software engineering with the blossom of open source projects.Many downstream real-world applications such as software refactoring,software maintenance,vulnerability detection and plagiarism detection take code clone detection as their first step.At the moment,detecting high level code clone is still a hard problem.PDG-based technologies can be used to detect syntactic similar codes as a kind of high level code clone detector.There are difficulties in this kind of research:the number of candidate PDG pairs may be large and the time cost of subgraph isomorphism may be long.Therefore,in this thesis we lucubrate the modification of PDG's structure,characteristic vector filtering and clone determination algorithm.The main content and contributions include:(1)We proposed and implemented a PDG-based code clone detector CCSharp.There are some problems in existing PDG clone detection research such as large PDG scale and numerous candidate PDG pairs.Aiming at these problems,we proposed PDG structure modification and candidate pairs filtering algorithm.First,we try to make some modifications on PDGs which Frama-C generated by removing and merging nodes.This can reduce the time cost of PDG subgraph isomorphism by downscale.Then,we design a characteristic vector filtering algorithm on the modified PDG to exclude the majority of non-cloned PDG pairs.Last,based on the above approaches we designed and implemented a PDG-based code clone detector CCSharp.The experimental results show that the PDG modification approach can downscale more than one third PDG scale averagely;the filtering algorithm increased hundreds of times on the filtering efficiency compared to traditional GPALG's way;CCSharp's accuracy and recall rate can be 91.7%,99.3%and 91.7%,89.8%on less and PostgreSQL datasets respectively.(2)We proposed graph kernel based code clone detection approach applying machine learning method.After using the PDG structure modification and characteristic vector filtering algorithm,the time cost of our PDG-based code clone detection tool CCSharp is obviously reduced.However,there is still a bottleneck problem of the time cost.We try to apply graph similarity calculation(graph kernel)and machine learning approaches to the clone detection filed.For a kernel method or a group of kernel method,we design two kinds of PDG similarity matrix as the input of machine learning approach.Then we label the datasets and use SVM to train a classification model to PDG clone judgment.The experimental results show that the accuracy of this way can be 70%~95%and the time cost is less than subgraph isomorphism method.
Keywords/Search Tags:code clone, PDG modification, characteristic vector filtering, subgraph isomorphism, graph kernel
PDF Full Text Request
Related items