Font Size: a A A

Research On Code Similarity Detection Algorithm Based On Deep Learning

Posted on:2022-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:E E ChenFull Text:PDF
GTID:2518306572997189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of computer technology in software is inseparable from a large number of codes,and many of these source codes can be easily obtained through the Internet.Although this makes it easier for everyone to learn programming,it also leads to a lot of plagiarism in coding.Therefore,convenient,effective and fast code plagiarism detection methods are particularly important in the current era.Traditional code similarity detection methods mostly use program attributes or structural information.As time goes by,the selection and statistics of attributes become more and more complicated,and the analysis of the structure becomes more and more complicated.It is a difficult task for designers of similarity detection methods to choose which attribute and structural information and which method to use this information.Aiming at the difficulty of selecting and using attribute or structural information in code similarity detection methods,a code similarity detection method based on deep learning is proposed.This method first converts the code into a time series through preprocessing,and then uses the learning ability of the neural network to let the algorithm extract the features in this time series by itself,without human selection.The extracted features are displayed in the form of feature vectors,and the cosine similarity between the last two feature vectors reflects the similarity between the two codes.At the same time,for different usage scenarios,such as different programming languages,the adaptation can also be completed by changing the training data without modifying the algorithm.Experimental results show that this method has a good detection effect.Nowadays,colleges and universities generally set up programming courses,and students have many plagiarisms in the course of homework.For teachers,it is a laborious and low-accuracy task to judge plagiarism manually.Therefore,according to the designed code similarity detection method based on deep learning,a code plagiarism detection system for college C language program work was developed.Users upload codes in the system,and the system will return the similarity of these codes.Through trial operation,it is confirmed that this system has greatly improved the efficiency of teachers in correcting C language homework and the accuracy of plagiarism judgment.
Keywords/Search Tags:code plagiarism, time series, deep learning, plagiarism detection system
PDF Full Text Request
Related items