Research And Implementation Of Retrieval Model For Plagiarism Detection

Posted on:2018-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2348330542990827

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of Internet,plagiarism is becoming more and more serious,and plagiarism detection has become the focus of academic research.People can through a variety of ways to get copied resources,more serious is the phenomenon of intellectual property theft also will be born,copy of this social phenomenon has extensive harm,plagiarism detection can effectively prevent the copying of this social phenomenon.The existing research on plagiarism detection mainly includes three aspects: the acquisition of plagiarism corpus,the retrieval of plagiarism sources and the text alignment of plagiarism.Based on these three aspects of research,the following innovative work is carried out.The main method of plagiarism corpus is to acquire corpus by manual work.According to this method is the quality and time efficiency problems,proposes a method for text alignment algorithm based on the corpus of plagiarism detection,automatic access to copy data,provide the basic data for the study of plagiarism detection.In this paper,a framework and a text alignment algorithm based on text alignment algorithm for plagiarism detection are presented,and the data obtained in this paper are statistically and evaluated.In view of the existing heuristic search method based on the source of the lack of theoretical support only depends on the experience of experts,this paper studies retrieval filtering model based on supervised learning source,gives the source retrieval framework and filtering algorithm,discusses the method of sorting learning and classification based on the method of filtering the retrieval performance in the source,a detailed comparison of the effects of the characteristics of various source retrieval performance of the source.In the process of filtering model construction,the feature and supervised learning algorithm with the best retrieval performance is successfully selected.Based on the copy word matching text alignment method in copy detection,detection of low copy fuzzy has been higher performance,but in the face of the implementation of various high fuzzy plagiarism means copying will exhibit poor retrieval performance.To solve this problem,a semantic based text alignment method is proposed.Semantic information is introduced into plagiarism detection,and the dispersed expression of words is analyzed.A semantic based text alignment model is given.Proved by experiment,this paper studies the way to construct filtering model and seedsearch model to make up for the shortcomings in the current study,improve the overall performance of plagiarism detection,provides a new direction of research methods and research for the source retrieval task and filtering text alignment seed search task.

Keywords/Search Tags:

plagiarism detection, source retrieval, text alignment, corpus obtain, filtering model, seed search model

PDF Full Text Request

Related items

1	Research On Plagiarism Detection Modeling Based On Statistical Machine Learning
2	The Key Technology Research Of Distributed Plagiarism Detection Based On Hadoop
3	Research On Text Plagiarism Detection Methods
4	Research Of Cross-Lingual Plagiarism Detection Mixed Translation And Bilingual Features
5	Research On Parallel Corpus Construction Based On Long Text Alignment And Document-Level Alignment
6	Research On Fast Retrieval Algorithm Chinese Expressions And Sentences Based On Chinese Corpus
7	Source Code Plagiarism Detection Based On Information Retrieval And Stacking Integrated Learning
8	Web-oriented Multilingual Parallel Sentence Pairs Mining Techniques
9	Research On Key Technologies Of Parallel Corpus Construction In Machine Translation Based On Pre-Training Model
10	The Study And Realization Of Paper Plagiarism Identification System Based On The Text Structure