Font Size: a A A

Hybrid Code Representation For Functional Clone Detection

Posted on:2023-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W HuaFull Text:PDF
GTID:1528306908468184Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Code cloning,which reuses a fragment of source code via copy-and-paste with or without modifications,is a common way for code reuse and software prototyping.For some reasons,such as due to the limited development time or limited programming experience,software developers sometimes have to copy and paste code fragments for catching up project deadlines,resulting in a large number of code clones in code repositories.Unfortunately,the duplicated code fragments often decrease the quality of the software,causing very high maintenance cost.To date,the existing clone detectors using shallow textual or syntactical features to identify code similarity are still ineffective in accurately finding sophisticated functional code clones in real-world code bases.Practically,code clone fragments that are similar at the semantic level are ubiquitous in software projects during the development,it is still a challenging task to detect code clones automatically by the existing tools,and the capacity of detecting clones of the tools still has limitations.Most the traditional code clone detection approaches by using static code analysis cannot conduct a satisfying functional code clone detection.With the rapid development of natural language processing in recent years,several deep learning based approaches have made significant progress towards the functional code clone detection.In this thesis,we first apply and research several deep-learning based neural networks to further boost the performance in the code clone detection.The main contributions and innovations of this thesis are:(1)by fusing the multiple-modal code representations,we found that the deep learning model can learn the important information in the code text accurately.After incorporating more semantic information into the code representation,the deep learning models have the capacity to learn the functional semantics from the complicated code representations.To the best of our knowledge,we are the first to propose the multi-modal code representation;(2)the higher level structural information of the code fragment can be extracted by using the graph convolutional networks,(3)equipped with the attention mechanisms,the deep learning-based models can be improved and the performance of the code clone detectors in the term of F1 is significantly developed and increased.We studied the different attention mechanisms for the three code representations individually to investigate the impact of the attention mechanisms on the final accuracy of clone detection and introduce the experimental details,and the experimental results.We have conducted extensive experiments on data set BigCloneBench(a Java clone code data set)and OJ clone sets to verify our works.For the consequential researchers,all the resource including the source code of the models and the datasets we used in this article are open-sourced.
Keywords/Search Tags:Code clone detection, code representation, deep neural network, attention mechanism
PDF Full Text Request
Related items