Font Size: a A A

A Study Of Named Entity Recognition For Legal Texts

Posted on:2024-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:X R ZhangFull Text:PDF
GTID:2556307061991689Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In legal practice,judicial professionals must extract helpful information from numerous legal texts(such as laws and regulations,judgments,and contracts).The primary carrier of such information is named entities.Therefore,the key to legal information extraction is the named entity recognition of legal texts,whose primary goal is to identify legal entities with specific meanings from legal texts and categorise them into predefined categories.These entities are of significant value for applications such as automated legal document processing,legal information retrieval,case analysis,and intelligent contract review.However,compared with named entity recognition in other fields,the named entity recognition of legal texts has many specific difficulties due to its domain specificity: 1)Recognition of long-expression entities.Legal texts contain many long-expression entities consisting of multiple nouns or phrases,which are difficult to tokenize,and they demand more stringent requirements for sequence modeling.2)Recognition of nested entities.Legal texts contain many nested entities,and the nested entities have multi-level entity structures,and the boundaries of the entities are intertwined and overlapped,which are difficult to recognise.3)Recognition of fine-grained entities.Legal texts need fine-grained descriptions,so they require more fine-grained entity types,leading to an increase in the number of entity types and data sparsity.To address the above difficulties,this paper designs two legal named entity recognition approaches based on pre-trained language models to model the legal named entity recognition task from global normalization and machine reading comprehension perspectives,respectively.The main contribution of this thesis is summarized as follows:1)To comprehensively address the above three issues,this paper proposes a RoBERTa-based named entity recognition method for legal texts by combining a global normalization perspective with a pre-trained language model.Specifically,we first use RoBERTa to extract char-level feature representations of a legal document.We use the Skip-Gram method to extract its word-level feature representations and fuse them better to capture the contextual information of entities in the document.Then,according to the concatenated result,we use the global normalization module to calculate the score of each subsequence of the document,to which it is an entity of a particular type.Finally,we employ the balanced Softmax function to determine whether or not a subsequence of the document is an entity of a specific type according to its score calculated by the global normalization module.Our evaluation experiments on the Chinese judicial domain dataset show that the proposed method outperforms the state-of-the-art baseline methods.Moreover,we applied our method in the Challenge of AI in Law Competition in 2021(CAIL 2021)and won third prize in the information extraction track.2)Unlike the first research work,we model the legal named entity recognition task from another perspective,i.e.,machine reading comprehension,to further address the above issues and improve the accuracy of entity recognition in legal texts.In particular,our proposed approach,called named entity recognition based on machine reading comprehension and dependency parsing,incorporates Biaffine attention,dependency parsing,and machine reading comprehension into a unified framework to provide a more comprehensive legal named entity recognition scheme.Specifically,we construct a query statement for each entity type according to the annotated guideline description,and then reconstruct the dataset into the 〈Query,Context,Answer〉 triple form for the machine reading comprehension task.Then,we use the BERT pre-trained language model to encode Query and Context,fused with the rotary position embedding after feature mapping.Next,we use the Biaffine attention to score each subsequence of a text.Finally,we use a balanced Softmax to decide whether or not a subsequence is an entity.We conduct numerous experiments to show that our model has achieved good results in recognizing nested and non-nested entities.We also conduct experiments to demonstrate the effectiveness of some components in the model we propose.
Keywords/Search Tags:Legal intelligence, Named entity recognition, Nested entities, Fine-grain entities, Neural network
PDF Full Text Request
Related items