Font Size: a A A

Clone Code Detection Based On Image Similarity

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2428330620967469Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clone code refers to two or more source code fragments that are identical or different in structure but have the same function in the code base.In the process of software development and maintenance,developers often use "copy-paste" or use the development framework,so that a large number of clone code appear in the software system.Clone code can not only reduce the potential risk of rewriting code,but also improve the development efficiency of the software,with a certain positive impact.However,studies have shown that a large number of clone code will have a negative impact on the software system.For example,reusing code fragments containing unknown errors may lead to software defect propagation and reduce the stability of the software system.In addition,if there is no good management of the existing clone code in the system,the code base will continue to expand,resulting in code base redundancy and increased maintenance costs.Therefore,detecting,identifying and presenting the existing clone code in software system is an important content in the field of clone code.In order to eliminate clone code and reduce its negative effects,at present,research mainly focuses on five representation of Text,Token,Tree,Metric and Graph in the field of clone code detection.However,few breakthroughs have been made in the effect of clone code detection for a long time.In view of this problem,a new clone code detection approach based on image similarity(Clone Code Detection Based on Image Similarity,CCIS)is proposed with the inspiration of image processing.In order to help developers identify and better understand the valuable information contained in clone code.The main work includes:(1)Pre-processing the source code that needs to clone detection,it mainly includes removing information that is not related to clone detection in the source code,such as whitespace,single-line comments,and multi-line comments,standardize the code format,identification and extraction source code at function granularity.In order to detect clone code more accurately,the function extracted in this thesis including loop nesting depth,which provides basic research for clone code detection.(2)Based on the code fragments with function granularity have been obtained,highlighting the snippets of code and convert them to images in the form of PNG(portable network graphics),depending on the weight of the keywords,data types,function names,identifiers,and numbers or strings in the code.The converted code image is represented by 8-bit gray-scale map,and read it into memory in the form of two-dimensional matrix of size m*n,so as to reduce the calculation amount of similarity detection in the later stage.(3)Convert the image into a negative image,i.e.reversing the colour value,and adjusting the image size so that the pixels and the proportion between the images are the same,thereby satisfying the condition of the subsequent image similarity detection.In addition to adjusting the size of the image,this thesis also uses the image blur filter to normalize the code image content and measure its effects to the detection performance.(4)Based on the normalized code images,we use the Jaccard distance and the perceptual hash algorithm for similarity detection,store the test results in an XML file,and visualization the clone code information that we get.In order to verify the validity of the experiment,six open source software were used to constitute the evaluation data set for testing.The experimental results show that image-based clone code detection and visualization approach can detect 100% type-1 clone code,88% type-2 clone code and 60% type-3 clone code,which proves the good effect of this method on clone code detection.
Keywords/Search Tags:clone code, clone detection, blur image filter, Jaccard distance, perceptual hash
PDF Full Text Request
Related items