Font Size: a A A

Comparative Question Generation Based On Advanced CopyNet

Posted on:2023-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DuFull Text:PDF
GTID:2558307061953879Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Question generation is a subtask in the natural language generation task.Comparative question generation is a natural language generation problem.The comparative question consists of comparative elements,including COMPAREE,STANDARD,INDEX,PARAMETER,and so on.Comparative question can be divided into two categories:superlative and gradable.Existing datasets are mainly for general question generation,but there are no datasets only for the generation of comparative question.Existing research methods mainly use the CopyNet model,which is a Seq2 Seq model with a copy mechanism,to generate question.CopyNet can copy Out-Of-Vocabulary(OOV)words from the input sequence to the generation question.This method has the following difficulties:(1)The probability distributions of words in the gradable and superlative comparative questions are different,and the INDEX generated by the model will cause mixed use problem.For example,in the gradable question,the model should generate "more" instead of "most".(2)The comparative elements of the gradable and superlative questions have different word order,which is determined by the collocation relationship between the comparative elements,and the comparative question generated by the model will have wrong match problem.For example,the wrong combination is INDEX followed by STANDARD("more" + "Egypt"),the correct combination should be INDEX followed by PARAMETER("more" + "points").The main work of this thesis is as follows:(1)To tackle the absence of the dataset specifically for comparative question generation,the dataset WikiCompareQuestion is constructed.The dataset is filtered from the public dataset Wiki Table Question and 4790 comparative questions and corresponding tables are retrieved.The comparative elements in the comparative questions are manually label so as to form a dataset including comparative elements,comparative element attributes,comparative questions and corresponding tables.(2)To tackle the problem of the mixed used on INDEX words in comparative question,STC-Net(Soft-Template CopyNet)model is proposed.The Soft-Template mechanism is introduced into the STC-Net.The comparative questions through mask operation is put into the GRU encoder and output the soft template vector,which is integrated into the generation mode of the CopyNet model.The experimental results show that STC-net is 1.8%,1.3%,and1.4% better than the CopyNet model in the BLEU-1,ROUGE-L and METEOR metrics,respectively,which is better than the current mainstream generation models.(3)To tackle of the problem of the wrong match of the comparative elements,the model PSC-Net(Pseudo-SQL CopyNet)is proposed.The intermediate representation language Pseudo-SQL(Pseudo-SQL)is used to improve the input layer of CopyNet.The Pseudo-SQL templates are designed for organize the comparative elements.The mixed embedding with the attributes of the comparative elements and comparative element are regarded as the input of PSC-Net.The experimental results show that PSC-Net is 0.5%,0.6%,and 0.8% higher than the STC-Net model in BLEU-1,ROUGE-L,and METEOR,respectively.
Keywords/Search Tags:Natural Language Generation, CopyNet, Question Generation, Comparative Question Generation
PDF Full Text Request
Related items