| In recent years,the judicial reform has continued to deepen.With the continuous improvement of the digitization of judicial institutions,the judicial text data has shown an exponential increase.The application of artificial intelligence technology in the judicial field has received extensive attention.Intelligent analysis and processing of massive legal documents has become an important content of judicial artificial intelligence research.The named entity recognition of legal documents,as a basic work in the field of judicial artificial intelligence,has important applications in such tasks as legal question and answer,sentencing prediction and judicial knowledge graph construction.At present,the research of named entity recognition has promoted the development of the downstream tasks of judicial artificial intelligence.However,the research of judicial named entity recognition is still in its infancy.The entity definition is not closely integrated with judicial business,and there are few high-quality named entity data resources.The word vector cannot solve the problems of polysemy of a word and is difficulty in recognizing the nested judicial named entities.Based on the deep learning method,this paper conducts in-depth exploration and analysis around the above practical problems.The main content and the phase results obtained are as follows:(1)Aiming at the problem of traditional name entity recognition,the Legal Corpus-Flat,a judicial named entity recognition corpus for theft cases,is constructed.And a method of legal document named entity recognition based on BERT-ON-LSTM-CRF is proposed.This method first uses a pre-trained language model based on the context of the word.The semantic vector is dynamically generated as the model input,and then the input is sequenced and hierarchically modeled using ON-LSTM to extract text features.Finally the CRF is used to obtain the optimal tag sequence.The experimental results show that the F1 value of the model reaches 86.09%,which is an increase of 7.8% compared to the baseline model.(2)Aiming at the problem of nested named entity recognition,a judicial nested named entity recognition corpus Legal Corpus-Nested is constructed based on Legal Corpus-Flat.And a model of machine reading comprehension based on fragment extraction is proposed.This model specifically designs a problem template and uses BERT Sentence pair encoding is performed to learn the judicial a priori contained in the question template.Through two multiclassifiers,the beginning and end positions of the entities and segment matching are extracted respectively,so as to extract the corresponding entities,which better retain the business of judicial texts information.The F1 value of the model reached 83.28%,an increase of 6.03%compared to the baseline model.(3)In order to improve the effect of low-resource named entity recognition,we firstly summarize the definition scheme of named entity recognition in the judicial field,and propose a judicial attribute named entity definition scheme focusing on judicial business.Then,we construct Legal Corpus-Transfer,a judicial named entity corpus that can be used for transfer learning,and propose a method based on counter-transfer learning to improve the performance of low-resource judicial named entity recognition.The experimental results show that the model can significantly improve the recognition of low-resource judicial named entities.The performance of the new corpus can be greatly reduced through this method.By introducing a pre-training language model,the performance of the model is effectively improved.By comparing multiple pre-training language models,Ro BERTa-wwm has achieved the best results. |