Font Size: a A A

Research On Biomedical Named Entity Recognition Algorithm Based On Multi-Task Learning

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhanFull Text:PDF
GTID:2480306542963659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Biomedical Text Mining can automatically dig out the key biomedical knowledge in the massive biomedical literatures,which plays an important role in promoting the construction of biomedical knowledge graphs and databases.Biomedical Named Entity Recognition is one of the most basic tasks in Text Mining,which can extract biomedical entity information from text.At present,many algorithms based on deep learning have been used to recognize biomedical named entities.However,due to the polysemous term in biomedical entities,the difficulty of accurately detecting entity boundaries,and the lack of labeled data,there are poor results in existing algorithms.Multi-task learning can improve the effect of each task by introducing inductive bias between related tasks.Therefore,this thesis proposes methods to assist in recognize biomedical named entities by using tasks such as trigger word detection and language modeling.The main research works of this thesis are as follows:(1)This thesis proposes a biomedical trigger detection and named entity recognition algorithm based on multi-task learning(MTL-TD-NER).At this stage,the lack of labeled datasets has seriously affected the effect of named entity recognition algorithms based on deep learning.Named entities and trigger are both key information used to describe biomedical events.There is mutually beneficial feature information between named entity recognition and trigger detection.Therefore,this thesis proposes a biomedical trigger detection and named entity recognition algorithm based on multi-task learning.The main idea of the algorithm is to first adopts the feature extraction layer based on the hard parameter sharing method to extract the similar features of entity recognition and trigger detection at the same time,and then uses different classification layers to process the two tasks separately,and finally passes the result information of the two task to each other's classification layer in order to help each other in the classification task.On the MLEE dataset,the experimental results show that the MTL-TD-NER algorithm can better detect trigger words and recognize named entities at the same time,compared with trigger detection algorithm and named entity recognition algorithm based on single task.(2)This thesis proposes a multi-task biomedical named entity recognition algorithm based on character-level unsupervised language model(MTL-CLM-NER).At this stage,the pretrained word vectors are usually static and unchanging,which cannot handle the Out of Vocabulary words well.Linguistic information in a specific field of biomedicine helps to improve the quality of pre-trained word vectors.Therefore,this thesis proposes a multi-task biomedical named entity recognition algorithm based on character-level unsupervised language model.The main idea of the algorithm is to first establish an unsupervised language model at the character level of a specific dataset,in order to learn the entire context character-level information in a specific domain.And then use the linguistic knowledge learned by the character-level language model to dynamically adjust the word vector in the named entity recognition model,and finally train the language model and named entity recognition at the same time.On the Bacteria and JNLPBA datasets,the experimental results show that the MTLCLM-NER algorithm has better recognition performance on named entity recognition tasks,compared with the existing deep learning-based named entity recognition algorithms.
Keywords/Search Tags:Biomedical Text Mining, Named Entity Recognition, Language Model, Deep Learning
PDF Full Text Request
Related items