| The construction of the judicial big data is gradually matured as the government pays a great deal of attention to promoting the openness and sharing of the justice information and the legal process.The plentiful information contained in judgment documents becomes a resource worthy of exploring in depth.It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language.Therefore,utilizing text mining technology to conduct information extraction and structured storage on legal documents is significant and important for the development of legal information construction and the improvement of judicial efficiency.Entity recognition and relation extraction for legal documents are techniques of text mining and crucial for legal information extraction.Text mining technology aims to transform the unstructured documents into several structured triplets,capturing the valid information for better understanding and applying the text content.Therefore,we propose a multi-granularity legal information extraction system for criminal cases based on the idea of pipelining.In view of structured information extraction from the legal documents,we define the criminal case information storage structure,and construct a rule set to extract the case information in the documents.For the fine-grained information in the fact description text,we train a triplet extraction model based on neural network to predict the entities and relations from the text.In order to mitigate the impact of the entities which do not constitute relationships,we apply different strategies for training the relation extraction module and further improve the performance of the triplet extraction model.In addition,we propose a joint entity and relation extraction model with legal feature enhancement,for sake of learning the interaction information during entity recognition and relation extraction,and taking advantage of the legal domain knowledge.The model integrates the legal feature into the encoder by self-attention mechanism,and obtains the vector representations of the triplets by the sequence-to-sequence architecture.The vectors are applied to the entity location and relation prediction.The experimental results show that the proposed method can effectively promote the model performance by introducing the legal feature integrated encoder and learning the interaction between entities and relations.Finally,we explore the application of multi-task learning in entity and relation extraction task on multi-crime legal documents,in consideration of the practical requirements.Through the crime classification task based on the inputs,the text modeling ability of the triplet extraction model is strengthened.From the experimental results,compared with the single-task model,the multi-task models can improve the F1-score by 1.6%on the triplet extraction task.Moreover,the proposed multi-task models have improvement on the datasets of different crimes,which verifies the effectiveness of our models. |