Font Size: a A A

Research On Key Technologies Of Element Extraction In Legal Instruments For Smart Court

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Q WangFull Text:PDF
GTID:2416330611998644Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With artificial intelligence becoming more and more popular,the application of deep learning technology in the legal field has been attracting extensive attention.The construction of the three major platforms for judicial disclosure has promoted the informatization of the legal field,thus providing a large-scale database which makes it convenient to retrieve documents of previous cases as well as analyze criminal phenomena.Meanwhile,it has become an urgent problem to be solved that we need to find an efficient way of using huge amout of data so as to help legal practitioners to read and analyze legal instruments rapidly and accurately.This paper is oriented to smart court,in which we studied and compared extraction technology in legal instruments based on named entity recogniton using deep learning models.Named entity recogniton is the fundamental task of natural language processing.Named entities extracted from legal instruments contain imformation that describes the cases,as a result of which they can help legal practitioners to grasp the key content of the isntruments in a very short time,improve work efficiency,and also provide reference for practitioners,making the extraction process fundamental for knowledge graph construction tasks in the legal field.Having read and analyzed a large number of legal instruments deeply and thoroughly,we summarized nine key elements in legal instruments,namely time,location,organization,defendant,victom,amount of money,object,injury and crime.These elements describe important information in a case,therefore have a great influence on the conviction and sentencing of the defendant.Due to the lack of data set meeting the requirements in terms of entity categories,in this paper we selected nearly 800 legal instruments from the CAIL2018 data set and built up an annotation data set by annotates characters according to the entity cateories.In this paper we applied four types of deep learning models to the nine-category named entity recognization task before we compared and analyzed the results.First we used BILSTM-CRF model to perform the NER task.We used Word2 vec to train character vectors which served as input to the BILSTM layer for encoding.BILSTM solved the distance dependence problem,its hidden layer outputing vectors containing context features which in turn were processed by Viterbi algorithm of the CRF layer to output she labeling sequence.We obtained the F1 value of 84.02% on our data set.In order to improve the model,we tyied the CNN-BILSTM-CRF model.The CNN model was added to capture the character semantics features through convolution operation,allowing an inprovement of 7.28% in the F1 value to a total 91.30% figure.BERT is a highly evaluated deep learning model in recent years,yet studies applying BERT to smart court is limited.The Word2 vec tool is not capable of solving polysemy problems and obtaining long distance features within words.In order to avoid these defects,we also used BERT to train character vectors.Using bidirectional Transformers encoder to read a whole sequence at one time,it can obtain features within characters no matter how far they are from each other.The F1 value of BERTCRF model is 80.17%.With BILSTM added to the model,we were able to extract further semantic features from the output vector of BERT and the F1 value is improved by 5.32%.The CNN-BILSTM-CRF model has the best performance in this paper.
Keywords/Search Tags:Named entity recognization, Long-short term memory network, Convolutional neural network, Conditional random field, BERT, Attention mechanism
PDF Full Text Request
Related items