Font Size: a A A

Research On Evaluation Methods For Paraphrase Generation

Posted on:2023-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:X R JianFull Text:PDF
GTID:2558306848955149Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Paraphrase generation aims to generate a sentence(a.k.a.paraphrase)but distinct expression as the given input sentence,which has been widely used in customer-serviceoriented dialogue and production introduction.Thus,the appropriate evaluation metrics become a crucial factor for improving paraphrase generation.Nowadays,the mainstream paraphrase generation metrics come from machine translation,and evaluate the semantic adequacy and fluency via computing the matching degree between paraphrase and references.However,the manually made references are limited and therefore can not cover the rich paraphrase phenomenon and result in uncorrected evaluation results.Besides,there is no metric for evaluation of expression diversity.To address the above problems,we investigate semantic evaluation metrics based on deep neural networks,diversity evaluation metrics and the combined metrics.The contributions are summarized as follows.(1)Design a systematic analysis and evaluation method for the evaluation metrics of paraphrase generation.First,we analyze factors considered in the popular metrics,including N-gram,synonyms and different granularities of semantic matching.Then,we design a annotation specification and use it to annotate a diversity-oriented evaluation dataset.We use the Person’s correlation coefficient and the Spearman’s correlation coefficient to evaluate the automatic metrics.The experimental results show that the semantic consistent evaluation metrics based on deep learning outperforms mainstream metrics by about 9%,which demonstrates the superiority of the deep semantic metrics in the case of diversified paraphrases.(2)Propose a comprehensive evaluation metric over the three dimensions based on deep learning.First,we propose a semantic evaluation metric relying on input sentence,and a fluency evaluation metric relying on references and three diversity evaluation metrics based on the surface information.Then,we propose unified evaluation metrics combining the above metrics in three fusion methods: linear-form PEWF,product-form PEMF and exponentially weighted form WPEMF.The experimental results on the diversity-oriented dataset show that the proposed unified evaluation metric outperforms all of the single-dimension evaluation metrics.In particular,the WPEMF fusion metric achieves the best results,in which Pearson’s and Spearman’s correlation coefficients are57.40% and 59.20% respectively.
Keywords/Search Tags:Paraphrase Evaluation, Paraphrase Generation, Semantic Representation, Diversity, Automatic Evaluation, Human Evaluation
PDF Full Text Request
Related items