With the continuous development of the biomedical field and biomedical technology,as well as the electronization and informatization of the biomedical system,the amount of biomedical data,including medical literature and electronic cases,shows a trend of rapid growth.How to extract relevant biomedical entities from a large number of unstructured biomedical texts has become a research hotspot.Although in the field of natural language processing,named entity recognition has become mature.However,compared with texts in other fields,medical texts have certain particularity.On one hand,the entity structure is complex and diverse.Nested entities,discontinuous entities and partially are pretty common.The traditional sequence tagging schema is not competent.On the other hand,the knowledge threshold is high.It requires the annotator to have the knowledge of biomedical field and machine learning annotation at the same time,which may lead to error prone and low quality of text annotation.At the same time,the named entity recognition data set constructed by distant supervision has poor effect and large amount of noise.Therefore,this paper mainly studies complex entity recognition and noise robust named entity recognition training method.To tackle the complex structure of named entities in the biomedical field,we proposes a route-aware model for entity recognition with diverse structures.This schema can represent all entities in the sentence without ambiguity,and realizes the unification of the complex entity recognition framework.The method is tested on CADEC and DDI data sets.The F1 of discontinuous entities are improved by 2.3%and 0.6%respectively,and the F1 of all entities are relatively good.We also proposes a noisy data set learning method based on training epochs,which is used to deal with the situation that there are many noise samples in the data set of named entity recognition in the field of biomedical text.We use the checkpoint retained by the model in the past epochs and the current model for joint optimization to avoid training multiple models at the same time.By introducing the consistency loss function,the model is encouraged to make prediction consistent with the previous checkpoint to prevent overfitting of noise.At the same time,a Gaussian noise increasing with the training epochs is introduced to make the model fit the correct samples in the early stage and prevent the fitting of noise samples in the later.The experiment results show that our method can achieve a performance close to the comparison model while reducing the computational cost. |