Font Size: a A A

Research On Named Entity Recognition Based On Deep Learning

Posted on:2020-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2417330572975984Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Nowadays,in the society that is committed to rapid development,the pace of technological development and the continuous improvement of the Internet have become growing irresistibly.This makes tens of thousands or even billions of information come out every day.Named entity recognition(referred to as NER),as a task to identify key and useful information from text data such as name of people,places,institutions,and other miscellaneous entities,just meet the people’s need to rapidly capture important information from text data.After years of research,identification method has evolved from rule-based to later using traditional statistical methods.The rise of deep learning in recent years has led people to try to learn named entity features and to do the job of NER in a deep learning method that does not require manual participation in the training process.Although the NER mission has made remarkable achievements in many fields such as medical field and military field,there are still some areas that are not deeply involved that need us to explore,such as the legal field which is closely related to people’s lives.Since 2017,our country has repeatedly published important documents to speed up the construction of legal intelligence.The establishment of intelligent judgments and intelligent courts is an urgent problem to be solved now.The NER task for legal texts is the first step to solve the problem.However,in the existing research on named entity recognition,the research results specifically for the legal field are rare.Therefore,this article follows the footsteps of the times and adopts a deep learning-based approach to identify named entities in specific fields.Firstly,this paper makes an experimental comparison on effects of NER between traditional statistical method and deep learning method.In order to compare the differences between two methods from multiple angles,this paper first studies two theoretical knowledge based on traditional statistical methods,namely hidden Markov model and conditional random field model,which recognizes the limitation of traditional statistical methods in named entity recognition task.For example,the hidden Markov model cannot achieve a good understanding of the context in the text,the recognition effect of the conditional random field model is too dependent on its feature template,etc.Next,in order to compare the entity recognition effects of the two algorithms,this paper chooses the news text corpus which is relative to our life as test subject to carry out named entity recognition work.These include: first,preprocessing the Sohu news data of 18 channels including domestic,international,sports,social,entertainment,etc.,which is close to the current time,and marking the identified named entities;Then,The LSTM-CRF deep learning model with additional gates was used to identify the names,such as people’s names,place names,organization names and other miscellaneous entities,and compared with the recognition effect of the conditional random field model in the traditional statistical methods.It is found from the experimental results that although the CRF model has a shorter running time than the deep learning model,its experimental effect is largely limited by the preset feature template.And compared with the deep learning method,it is impossible to learn more relevant features of the data as much as possible.After conducting comparative experiments of the two models,this paper extends the corpus of named entity recognition to the legal text corpus,and selects the criminal case legal document as identification object in related research experiments of named entity identification,as follows: first of all,due to the particularity of the composition of legal texts,the named entities of text data are divided into four types in the data preprocessing: people’s names,place names,institution names and criminal charges,and 183 criminal charges are manually added in the process of entity labeling;then,the LSTM-CRF deep learning model with additional gates is used to identify the text data obtained by preprocessing,and obtains a better recognition effect on the identification of criminal offences by enlarging the word embedding layer.Through analyzed the experiment results also found some regularities of the legal documents’ composition.Finally,compared with the experimental results of Bi-LSTM-CRF model,it is proved that the LSTM-CRF model with additional gates used in this paper can get better entity recognition results when applied to the identification of named entities in the legal field.
Keywords/Search Tags:News text, Legal text, Deep learning model, Conditional random field model
PDF Full Text Request
Related items