Font Size: a A A

Research On Coreference Resolution Of Speaker Mentions In Legal Court Record Documents

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:J GaoFull Text:PDF
GTID:2416330629984456Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Coreference resolution is one of the fundamental tasks in natural language processing,and plays an important role in text semantic understanding.Existing research is mainly concentrated on texts in general domain,such as news wire,broadcast conversations,Wikipedia and blogs,etc.,and less attention has been paid to the research on coreference resolution in the legal field.Legal texts are rigorous,highly professional and knowledgeable,which are different from ordinary texts.In the Court Record Documents(CRDs),the speakers may be referred in multiple ways.It is of great significance for the legal text analysis to resolve the co-referred mentions of the speakers.Based on this,an end-to-end coreference resolution model incorporating label representation was proposed,and entity recognition and coreference resolution were jointly conducted.The following three aspects were discussed:1)Based on the pronoun entity,name entity and status entity related to the speakers,the relation extraction within the entities was formalized as a document-level coreference resolution problem.Two types of coreference schemas were discussed experimentally,which are to generate coreference links from pronoun entity to name entity and from pronoun entity to status entity.According to the experimental conclusions of entity relevance,a labeling strategy is designed to integrate the predefined category information of litigation status into the label of the name entity,and an end-to-end coreference resolution can be performed based on the name entity and pronoun entity.2)A pipeline coreference resolution model based on multi-scoring setup was proposed.The task was decomposed into entity recognition and coreference resolution.At first,a sequence labeling model was employed to extract entities.Second,reference relation between entities were integrated into the model by applying Graph Convolutional Neural networks.Third,Feed-Forward Neural Network and deep biaffine attention mechanism were used to score the candidate pairs,and the dependency between the anaphora and the antecedent in the candidate pair was considered.3)An end-to-end coreference resolution model incorporating label information was proposed.Based on the experimental conclusions of the pipeline model,the task was further discussed in a joint manner.First,a softmax pruning module was applied to dynamically select higher confident spans as candidate entities.Second,the label representation of the entity was encoded to compute the similarity between the current entity and label representation of previous antecedent.The similarity scores were added to the final candidate scores.At last,the multi-scoring setup was used to scoring the candidate pairs.The integration of label information can effectively alleviate the problem of the imbalance of the number of entities,make full use of the relevant relation of the pronoun and status entities,and improve the model's ability to predict the coreference relations in different types of entities.The experimental results on the CRDs show that the proposed end-to-end coreference resolution method can reasonably model the speaker's pronoun resolution problem.Compared with the baseline model,the experimental results increase by 2 to 7 percentage points and reach 75.35% F1-score.The joint model effectively alleviates the error propagation problem in the pipeline model,enables information interaction between subtasks,and proves the effectiveness of the proposed method.
Keywords/Search Tags:Legal Text Mining, End-to-end Coreference Resolution, Label Representation, Information Extraction, Natural Language Processing
PDF Full Text Request
Related items