Font Size: a A A

Research On Named Entity Recognition Algorithm And Its Implement In Specific Fields

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y M WangFull Text:PDF
GTID:2428330623968530Subject:Engineering
Abstract/Summary:PDF Full Text Request
The named entity recognition task,for short,can also called NER task,is to identify predefined entity boundary and entities' classes from a passage or a sentence.It is one of the basic tasks of natural language processing.Its results can directly affect the performance of downstream tasks,such as relationship extraction,intelligent QA,etc.,it's basic but it's important.The common task of named entity recognition is to recognize common predefined entity categories such as people's names and place names by using open large corpus datasets such as Wikipedia and people's Daily.At the same time,this task is also can applied to specific scenes in various specific fields,such as weapon equipment named entity recognition in military field,electronic medical record entity recognition in medical field,etc.,which can be called named entity recognition task in specific fields.Due to the frequent occurrence of professional terms in entity recognition in specific fields,it is difficult to accurately identify the boundary of entities,resulting in entity recognition errors,or there are many cases of entity nesting and entity combination,resulting in classification errors due to less of corpus for training.In this paper,taking the medical electronic medical record as an example,in the face of data characteristics and difficulties in specific fields,two models are constructed to solve the problem of named entity recognition in specific fields,and the specific medical electronic medical record data is taken as an example for specific analysis and model prediction.The main work and innovation of this paper are as follows:1)The first named entity recognition model proposed in this paper is improved on the basis of LSTM-CRF model,so that it can better identify professional vocabulary in specific fields and improve the overall effect.To improve the LSTM coding layer and add lattice structure,lattice structure needs two inputs,one is the charactor vector,the other is the word vector formed by the end of the word.Lattice structure is used to integrate the charactor vector and the word vector in the memory unit of LSTM decoding layer,build our own medical dictionary,improve the segmentation accuracy,and then use word2 vec to get the input At the same time,we use the advanced pretraining model Bert to get the input charactor vector,analyze the complex and inefficient problems of lattice's implementation,borrow lattice's idea but improvelattice's structure.Instead of integrating the two vectors at the memory unit,we use the self attention fusion strategy to complete the word vector set at the end of this charactor before inputting the memory unit The fusion is then combined with the word vector to make it more efficient and easy to realize.Five groups of different comparative experiments are done,and it is found that the model can better identify and classify predefined entities.After cross validation,the result of F1 value is better than other comparative experimental models,with an increase of 1-3 points.2)Because it is found that domain specific dictionaries can promote the recognition of entity boundaries,the second model proposed in this paper borrows the idea of multi task learning.Multi task learning aims at the same dataset in a specific domain,different but similar multi task training processes can extract common features to improve the effect of a single task.In this paper,NER and CWS adversarial training model framework is constructed,extract common information,and add multi-head Attention mechanism in LSTM decoding layer of CWS task and feature sharing layer to improve performance.In the contrast experiment,the results of the joint training of NER and CWS are better than that of the single NER task model.3)Looking forward to the possible improvement direction in the future,we can consider the application of generating model and active learning methods to the acquisition and expansion of labeled corpus in specific fields,at the same time,we can also explore how to increase the number of tasks in multi task learning,or choose a better pre-training model after the computing power is improved,etc.,which are the future direction of this task.
Keywords/Search Tags:Named Entity Recognition, EMR, Bert, multi-task adversarial learning
PDF Full Text Request
Related items