Plagiarism Detection Method Based On Multi-Feature Extraction And Multi-Sentence Fusion

Posted on:2024-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X L Wang

Full Text:PDF

GTID:2568307139995749

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the big data era,the digital resources on the Internet are growing rapidly,and plagiarism is becoming increasingly rampant.Manual detection is no longer able to cope with the massive online resources.Therefore,research on automatic plagiarism detection methods has become crucial.Traditional plagiarism detection methods lack the depth of semantic analysis of the text and cannot cope with complex paraphrasing.The latest research,which uses deep learning techniques to extract deep semantic features,has achieved some improvement in plagiarism detection work.However,existing plagiarism detection methods still have the following shortcomings:First,compared with traditional methods,deep learning-based methods require more processing time and have low detection efficiency when facing massive plagiarism data.Second,most existing methods use sentences as plagiarism units and compare independent sentences in pairs.This method does not combine with contextual information and cannot cope with complex situations such as splitting one sentence into multiple sentences,merging multiple sentences into one sentence,and plagiarizing multiple sentences into multiple sentences.Third,the sequential nature of text features is not considered,leading to ineffective feature extraction and integration.To solve the above problems,this thesis proposes a novel plagiarism detection method.The method is divided into three stages: paragraph-level,sentence-level,and post-processing.In the paragraph-level stage,a new similarity factor called IPFMGS is designed to measure the similarity between paragraphs.By comparing paragraphs to each other and filtering out plagiarized paragraphs,this method ensures efficient detection and improves filtering effectiveness.In the sentence-level stage,first,a multi-sentence semantic feature extraction and fusion network is proposed to use convolutional neural networks to fuse multiple sentence semantics,comprehensively capturing plagiarism features under various complex situations.Second,multiple features are extracted using a single-sentence semantic feature extractor and a vocabulary feature extractor.Third,the Bi-LSTM sequence model is used to fuse the extracted features and combine contextual features to detect plagiarized sentences and effectively integrate features.In the post-processing stage,unlike existing methods that directly merge sentence pairs,this thesis proposes a plagiarism fragment matching algorithm to determine the correspondence between plagiarism fragments.The proposed method was experimentally evaluated on three datasets(PAN12,PAN13,and PAN14),and the results show that the proposed method outperforms existing detection methods.

Keywords/Search Tags:

deep learning, plagiarism detection, multi-feature, multi-sentence semantic fusion

PDF Full Text Request

Related items

1	Research On Semantic Similarity Matching Algorithm Of Questions Based On Deep Learning
2	Deep Sentence Interactive Matching Model Based On Multi-perspective Feature Fusion
3	Plagiarism Detection Of Multi-threaded Programs Via Extraction And Representation Learning
4	Design And Implementation Of Document Plagiarism Detection System Based On Semantic Neural Network
5	Research On Image Semantic Description Generation Method Based On Deep Learning
6	Research On Salient Object Detection Algorithm Based On Multi-layer Feature Fusion
7	Research On Deep Learning Object Detection Technology Based On Multi-Scale Feature Fusion
8	Research On Semantic Segmentation Method Of Remote Sensing Image Based On Deep Learning
9	Research Of Deep Learning Based On Upsampling Technology
10	Plagiarism Detection Algorithm Based On BiLSTM And Its Application In Duplicate Checking System