| With the increasing requirements of software features,the softwares also have bigger scale.To improve the efficiency of software development,progammers often reuse th code in code repositories or the mature development frameworks,which lead to code clone.Semantic code clone refers to a pair of code with the same functaionality but implemented in different ways.The semantic code clone can be used for code vulnerability detection or code refactoring.Current methods for detecting code clones usually use abstract syntax trees or textual information to represent code semantics,leading to the problem of insufficient semantic information and inaccurate code semantic markers when representing code semantics.In order to effectively detect semantic code clones,this paper first investigates the defects of current research work in detecting semantic code clones,and analyzes the textual features of semantic code clones.Based on the research results,this paper designs a semantic graph-based method for detecting semantic class code clones,which obtains semantic information such as code data flow and control flow from code intermediate representation and constructs a semantic graph,and maps the semantic graph to a high-dimensional vector space using graph matching network,which can compare the similarities and differences between code semantic information more accurately.The main work of this paper includes:(1)In this paper,we investigate the semantic class code clone features.Firstly,we study the defects and shortcomings of existing deep learning-based code clone detection methods in detecting semantic class code clones,and analyze the main reasons affecting the detection results.The features of semantic code clones are investigated on two publicly released semantic code clone datasets: Se Sa Me and GCJ,and the semantic related and unrelated information of semantic code clones is analyzed.(2)In this paper,we propose a semantic graph-based method for detecting semantic class code clones.By using the intermediate representation information of the code,extracting the key semantic information such as data flow and control flow of the code,and combining the semantic class code cloning features,we construct a semantic graph that can more accurately represent the semantics of the code.In this paper,we use graph matching network to compare the similarities and differences between semantic graphs more directly,and then detect semantic class code clones.(3)To evaluate the effectiveness and robustness of this method in detecting semantic class code clones,the largest semantic class code clone dataset CF-500 is constructed,which contains 23416 codes with 500 different semantics.The results show that the accuracy,recall and robustness of this method in detecting semantic code clones are better than those of the baseline method. |