Font Size: a A A

Keyphrase Extraction For Legal Questions Based On Sequence To Sequence Model

Posted on:2020-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:G W TongFull Text:PDF
GTID:2416330602960170Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Keywords are usually short and summative content that can describe topic information in longer texts.High quality keywords provide users with highly concentrated and valuable information.Keyword extraction of legal issues contributes to legal information retrieval,legal question-answering systems and legal information classification,thereby improving citizens' legal literacy and promoting the construction of a country ruled by law.The sentences of legal questions are mostly short.This paper mainly studies the keyword extraction of legal questions in short texts.Most existing keyword extraction algorithms usually use two steps to extract keywords of long text:the first step is to divide the content of the keyword to be extracted into multiple text blocks,which are used as candidate keywords;The step is to sort the candidate keywords according to the importance of the text content.These algorithms require a large number of text features and statistical features.These features are difficult to extract in short text legal questions,so they are not effective in extracting legal questions(short text).This paper proposes a sequence-to-sequence(seq2seq)model based on reinforcement learning to extract keywords from the legal issues of short texts,mainly to carry out the following work:(1)In order to solve the traditional keyword extraction algorithm can not extract those keywords that have not appeared in the original text,inspired by the sequence-to-sequence model to achieve good results in machine translation,the keyword extraction task as a generation task is not simple The extraction task uses sequence-to-sequence models to generate keywords.The principle of the cyclic neural network and the encoder and decoder structure of the sequence to sequence model are introduced.The attention mechanism and replication mechanism are then described.Finally,a comparative experiment proves that the sequence-to-sequence model can achieve better results in the keyword extraction of legal questions.(2)The keywords generated by the sequence-to-sequence model are trained using the traditional cross-entropy algorithm,and when the extracted keyword sets appear in slightly different sequential order,they are considered to be the wrong keywords.In order to solve this problem,this paper proposes to use reinforcement learning to train the proposed model.First introduced the basic principles of reinforcement learning.Then a reward function is proposed to reward and punish the generated keyword set,and solve the order in which the model generates duplicate keywords and ignores the keyword set.Finally,a cross-entropy-based sequence-to-sequence model and a sequence-to-sequence model with replication mechanism are compared.It is proved that the enhanced learning training sequence to the sequence model can achieve better experimental results.
Keywords/Search Tags:Key phrase extraction, sequence-to-sequence model, reinforcement learing
PDF Full Text Request
Related items