Complicated Named Entity Recognition For Biomedical Texts

Posted on:2023-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:S K Liao

Full Text:PDF

GTID:2544306914477144

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of the biomedical field and biomedical technology,as well as the electronization and informatization of the biomedical system,the amount of biomedical data,including medical literature and electronic cases,shows a trend of rapid growth.How to extract relevant biomedical entities from a large number of unstructured biomedical texts has become a research hotspot.Although in the field of natural language processing,named entity recognition has become mature.However,compared with texts in other fields,medical texts have certain particularity.On one hand,the entity structure is complex and diverse.Nested entities,discontinuous entities and partially are pretty common.The traditional sequence tagging schema is not competent.On the other hand,the knowledge threshold is high.It requires the annotator to have the knowledge of biomedical field and machine learning annotation at the same time,which may lead to error prone and low quality of text annotation.At the same time,the named entity recognition data set constructed by distant supervision has poor effect and large amount of noise.Therefore,this paper mainly studies complex entity recognition and noise robust named entity recognition training method.To tackle the complex structure of named entities in the biomedical field,we proposes a route-aware model for entity recognition with diverse structures.This schema can represent all entities in the sentence without ambiguity,and realizes the unification of the complex entity recognition framework.The method is tested on CADEC and DDI data sets.The F1 of discontinuous entities are improved by 2.3%and 0.6%respectively,and the F1 of all entities are relatively good.We also proposes a noisy data set learning method based on training epochs,which is used to deal with the situation that there are many noise samples in the data set of named entity recognition in the field of biomedical text.We use the checkpoint retained by the model in the past epochs and the current model for joint optimization to avoid training multiple models at the same time.By introducing the consistency loss function,the model is encouraged to make prediction consistent with the previous checkpoint to prevent overfitting of noise.At the same time,a Gaussian noise increasing with the training epochs is introduced to make the model fit the correct samples in the early stage and prevent the fitting of noise samples in the later.The experiment results show that our method can achieve a performance close to the comparison model while reducing the computational cost.

Keywords/Search Tags:

biomedicine, complex named entity, noise robust, named entity recognition

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Normalization For Chinese Biomedical Texts
2	Named Entity Recognition In Medical Field Based On Deep Learning Of Chinese
3	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
4	Research And Implementation Of Chinese Named Entity Recognition Algorithm For Medical Field
5	Research On Named Entity Recognition And Normalization From Biomedical Text
6	Named Entity Recognition For Medical Field
7	Research On Named Entity Recognition Technology For TCM Field
8	Research On Chinese Named Entity Recognition In Medical Field
9	Construction And Research Of Chinese Electronic Medical Record Named Entity Recognition Corpus
10	Deep Learning Based Medical Named Entity Recognition