| Question Generation(QG)is a challenging Natural Language Processing(NLP)task which aims at generating questions with given answers and context,meanwhile,the generated question can be answered by the given answers with the context.Recently,with the development of deep learning technology,using neural network to automatically generate high-quality questions becomes possible.Therefore,QG has been attracting more and more attention from the NLP community.In this thesis,we focus on improving QG models’ performance via effectively utilizing text linguistic features,including Named Entity Recognition(NER),Part of Speech(POS),and so on.Most of previous works are based on sequence-to-sequence framework,adopting attention and copy mechanism.Similar to traditional word embedding,these works normally embed linguistic features with a set of trainable parameters,which results in the linguistic features not fully exploited.To solve this issue,we propose to utilize linguistic information via large pre-trained neural models.To be specific,at first,these pretrained models are trained in several specific NLP tasks in order to better represent linguistic features.Then,such feature representation is fused into a seq2 seq based QG model to guide question generation.In addition,we invent a novel linguistic feature customized for QG,namely QAF,which is short for Question Answering Feature.This feature can represent the relationship among the answer,context and question,considering QA and QG are dual tasks,helping generate question with higher answerability.To demonstrate the effectiveness of our approaches,we conduct extensive experiments on two benchmark QG datasets: SQu AD and MS-MARCO.The experimental results show that our approach outperforms the state-of-the-art QG systems,as a result,it significantly improves the baseline by 17.2% and 6.2% under the BLEU-4 metric on these two datasets,respectively.Furthermore,in this thesis,we do lots of case study to analyze the influence of deep linguistic features for question generation.Finally,we propose a universal model’s performance boundaries exploring method: DDS,short for Difficulty-based Data Splitting Strategy,which can estimate model’s best and worst performance on a dataset.Through evaluating model’s performance boundaries,researchers can comprehensively understand their models. |