| Keyphrase generation is one of the most important and attractive tasks in natural language processing.Keyphrases provide brief but representative information in a document,and thus can help people quickly obtain valuable information from a large amount of data.With advances in deep neural networks,keyphrase generation based on the Seq2 Seq model has recently attracted more attention.So far,great progress has been made in keyphrase generation based on Seq2 Seq.In keyphrase generation,the key is the accuracy and diversity of the predictions.On the one hand,it is important for the keyphrase generation model to ensure the accuracy of the predictions.However,sequential semantic learning adopted by most existing methods cannot effectively capture the salient information of the document,and the training strategy limits ensure semantic consistency between the source and target keyphrases,thus affecting the accuracy of the generated keyphrases.An excellent keyphrase generation model,on the other hand,can produce diverse keyphrase expressions.Nevertheless,current keyphrase generation models using the full text as the basic decoding unit cannot effectively model the documentkeyphrase one-to-many mapping relationship,and the generated keyphrases may lack diversity.For the problems that the sequential method cannot effectively capture salient information well and the lack of diversity in generated keyphrases,this thesis proposes to utilize the syntactic dependency information and a new decoding pattern to improve the performance of the keyphrase model.Then based on the proposed solutions,a prototype system is designed and implemented to provide accurate and diverse predicted keyphrases.The research contents are summarized as follows.(1)To solve the accuracy problem caused by inadequate capture of salient information,a method of syntactic dependency Graph for Keyphrase Generation(GKG)is proposed.Specifically,the document-level graph determined by syntactic structure is fed into a structuralsemantic encoding mechanism to capture dependency structure information.It is potentially useful to identify salient information by combining dependency information with contextual information.Meanwhile,estimating and maximizing the mutual information between the document and its target keyphrase constrains the decoder to learn more unique information from the input document,thus enforcing the consistency between them.(2)To tackle the lack of diversity in generated keyphrases,the Hierarchical Subtopic Modeling based Keyphrase Generation(HSMKG)is proposed.The full document semantic information is partitioned and the subtopic is used as the basic decoding unit to explicitly model the one-to-many mapping.Multiple decoders select subtopics before decoding and improve the diversity at the content selection level by focusing on different subtopics.To take into account various means of expression,a diversity-promoting algorithm based on random sampling in the latent space is proposed further.Experimental results show that the proposed method can achieve a better trade-off between quality and diversity compared with existing approaches.(3)A prototype system for keyphrase generation system based on document structure information is implemented.The system combines the above algorithms and adopts a hierarchical design of the MVC framework to visually display the predictions and provide keyphrase-based document management.The system uses the information interaction module to receive and process user operation requests and display the corresponding results,and generates accurate and diverse keyphrases through the algorithm module based on GKG and HSMKG. |