Font Size: a A A

Research On Paraphrase Identification Based On Deep Learning

Posted on:2020-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:L Y TianFull Text:PDF
GTID:2428330575468799Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Paraphrase identification is to determine whether two text fragments are semantically consistent.Its essence is to judge the semantic matching between texts.Paraphrase identification is the basis of research fields such as information retrieval,machine translation,automatic question and answer and paraphrase problem.It is the key technology and basic research of natural language understanding.This paper takes paraphrase identification under the deep learning framework as the research object,aiming at improving the performance of paraphrase identification.This paper focuses on Construction of Multi-layer Neural Network Based on Text Semantic Features in Paraphrase Identification,Automatic Generation of Paraphrase Corpus and Modeling Problem of Deep Paraphrase Identification Model for Semantic Interaction on Multi-syntax features.This paper mainly studies from the following three aspects:1)For paraphrase identification,the corpus data with accurate labels in the paraphrase task is less,the acquisition of the real paraphrase corpus is difficult,and the artificial simulation construction is costly and time consuming.This paper proposes a corpus generation model PTGM-GAN(Paraphrase Text Generation Model-Generative Adversarial Networks).In the framework of generative adversarial networks,the automatic construction of paraphrase corpus is modeled as the automatic generation of paraphrase texts.The generator generates paraphrase texts with the original sentence as a guide.The discriminator uses a convolutional neural network to train the discriminant model with the sentence pair as input,and feedbacks the text generated by the generator through the discriminator.The experiment uses the advanced text generation method Seq2 Seq model and VAE-SVG as the baseline method.Experiments with Microsoft's MSCOCO dataset and Quora problem dataset demonstrate the effectiveness of PTGM-GAN.2)In order to solve the problem of poor performance of text semantic matching method in paraphrase identification,this paper proposes a deep paraphrase identification model DPIM-MLSF(Deep Paraphrase Identification Model-Merging Lexical and Semantics Features),which integrates lexical and semantic features.By introducing the METEOR evaluation index of machine translation as the representative,it captures semantic features based on knowledge base.The semantic matching of text,combined with the traditional feature based on word matching,constructs a multi-layer neural network.Experiments on MSRP and PAN2010 data sets show that the proposed model achieves better F1 values than those based solely on vocabulary matching or semantic matching and typical classification models such as SVM,Bagging and AdaBoost.3)In order to solve the problem that deep paraphrase identification model of syntactic and semantic interaction only realizes semantic interaction on the same syntactic features.This paper proposes a deep paraphrase identification model DPIM-ISMSF(Deep Paraphrase Identification Model-Interacting Semantics on Multi-Syntantic Features)for semantic interaction on different syntactic features and for text.Different syntactic roles interact with each other and fuse 2)semantic features can realize the paraphrase identification.Experiments on MSRP,PAN2010 and 1)expanded Corpus(MSRP+ & PAN2010+)verify the validity of the model.
Keywords/Search Tags:Paraphrase identification, Deep learning, GAN, Paraphrase generation
PDF Full Text Request
Related items