Font Size: a A A

Research On Enterprise Entity Recognition And Classification For Court Documents

Posted on:2018-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:J W ChenFull Text:PDF
GTID:2416330512498187Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the "Internet finance" boom,financial institutions urgently need to use more advanced information processing methods to extract and analysis large amounts of Internet data.Among these massive amounts of data,the court's document data is the primary source for financial institutions because of its accuracy and authority.Named entity recognition and classification technique is the basis of entity semantic analysis and entity relationship extraction for these institutions.At present,the mainstream named entity identification and classification technique only divided the entity into very few types,making the type of entity lack of semantics.In the case of named entity classification,the current method is too much dependent on the artificial characteristics and external data,so that the versatility and robustness is not guaranteed.In view of these shortcomings,this paper presents a more granular approach to the name of the financial institution,and uses the semantics of the text to construct the features,and finally classifies these named entities.The main work of this paper includes three aspects:(1)The different varieties of entity words in court documents,brings a lot of difficulties to the entity recognition work.In this paper,we propose an improved conditional random field model,which uses additional features generated by dictionary and SVM.The experimental results show that the recall rate can be greatly improved on the 5000 data sets marked manually.(2)At present,the types of entities in the mainstream entity classification is less,so that cannot be satisfied with the need of entity relationship extraction and other applications.In this paper we refine the entity in the legal documents into 15 types,and mark 6891 data samples.(3)The existing entity classification model relies too much on the artificial characteristics and external data,so that the model does not have good versatility.In this paper,an entity semantic representation method is proposed,and the classification experiment is carried out in the artificially marked court documents.The experiment shows that it has good performance.
Keywords/Search Tags:named entity recognition, fine-grained entity typing, Condition Random Field, Word Embedding
PDF Full Text Request
Related items