Font Size: a A A

Research Of Ch-En Cross-Lingual Plagiarism Detection Based On Translation Features And Contents

Posted on:2012-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:S X YuanFull Text:PDF
GTID:2218330362959369Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Research on anti-plagiarism detection of scientific papers in single language has acquired relevance and a number of practical systems have been developed. However, relevant study and achievement is relatively few in cross-lingual anti-plagiarism.Targeted at scientific papers, the paper discusses the implementation of Chinese-English cross-lingual plagiarism detection. The paper locates a set of translation features by digging internal laws of Chinese translation. Through these features, papers which are suspected of plagiarism can be identified by the Decision Tree algorithm.This paper also translates high frequency stem words in Chinese translation into English. Suspected plagiaristic scientific papers are identified by comparing the similarity between the traslation and corresponding articles screened from the corpus.This paper introduces the key technologies of natural language processing, text segmentation, POS tagging, syntactic analysis and semantic analysis. Then it introduces the text classification, text representation, feature extraction and text classification.Based on these technologies, this paper presents the algorithm of Chinese-English cross-lingual plagiarism detection based on translation features. In closed test, the average accuracy rate closed up 95%, the average recall rate reached 97%; in open test, the average accuracy rate reached 88% and recall rate reached an average of 89%. These result shows good effects.Finally, the paper proposes a content based cross-lingual translation plagiarism identification algorithm. By machine translation, comparing the similarity between the document to be detected and documents in corpus can determine whether the document is plagiarised.And this algorithm also achieved good results.
Keywords/Search Tags:plagiarism detection, cross-language, translation feature, decision tree, SVM, similarity
PDF Full Text Request
Related items