Font Size: a A A

Deep Learning Approach For Medical Named Entity Recognition

Posted on:2020-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:K XuFull Text:PDF
GTID:1364330572479190Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
Medical named entity recognition plays an important role in biomedical research,and it has brought extensive research in recent years.However,three problems still exist to tackle.Firstly,the recognition accuracy problem.The number of new medical entities is increasing rapidly,while the accuracy of traditional identification methods is not high enough.Secondly,the computational efficiency problem,i.e,the deep learning-based recognition method is not structurally efficient.Thirdly,the lacking of multi-category medical entity identification problems.To improve the accuracy of medical named entity recognition,the semantic-based deep learning approach is studied.A character-based BiLSTM-CRF(CBLC)is proposed to capture the intermal structure information of a word through character-level word embedding.A semantic BiLSTM-CRP(SBLC)is proposed,which trains word embeddings on a large number of medical resources with semantic information,and uses BiLSTM-CRF to capture the relationship between the context of the semantic structure and the label,combining Ab3P to effectively recognize abbreviations.The results show that CBLC is superior to the widely used baselines such as random field and dictionary matching.SBLC is superior to the advanced approaches such as DNorm and TaggerOne.On the basis of semantics,in order to solve the problems of rare medical entity recognition and entity tagging inconsistency,an trie tree based medical dictionary matching approach is firstly designed,and then two deep learning approaches that integrate dictionary attention are proposed,i.e.,the Dic-Att-BiLSTM-CRF(DABLC)and Dic-Att-BiGRU-CRF(DABGC).DABLC weightedly combines the dictionary matching and document-level attention into BiLSTM-CRF.In DABGC,the dictionary is used to match the medical dictionary.At the same time,the bi-directional GRU network is used to train the word embedding,and the hidden state containing context information is output.It analyzes the structure between words through a multi-head attention mechanism.DABLC and DABGC can effectively utilize external dictionary resources to solve the rare and complex medical entity recognition problems,further improving the accuracy of deep learning approaches.In order to improve the computational efficiency of the deep learning approaches,two accelerated deep learning approaches are proposed.Firstly,Att-SGRU-CRF(ASC)is proposed to improve the training speed by using the sliced GRU network and the hierarchical computational structure.The attention mechanism is used to solve the problem of entity tagging inconsistency,and combined with CRF to calculate the optimal label sequence.Secondly,an attention-based iterative expansion convolutional network(AIDC)is proposed,which is combined with an iterative expansion convolutional network(IDC)and a multi-head attention approach.AIDC inputs the word embeddings into the iterative expansion convolution network to accelerate training,and outputs the final label by combining the multi-head attention mechanism with the CRF.Compared with traditional neural networks,the ASC approach is 50 times faster,and obtains a higher FI score at the same time.AIDC is 1.9 times faster than BiLSTM while maintaining high recognition accuracy.The computational efficiency of the deep learning approach is improved.To solve the problem of multi-category medical entity recognition,an approach named Text Classification Weighted Voting(TCWV)is proposed.Combined with the rank constrained linear text classification model,the texts are classified more efficiently with a small amount of training texts.TCWV integrates multiple deep learning approaches by the weight voting algorithm,and different categories of medical texts are used as input for word embedding training for different named entity categories.On the disease,chemical and genetic datasets,TCWV obtains the highest FI score,achieving the goal of multi-category medical named entity recognition.The experimental results show that the proposed methods solved some of the problems of the current deep learning methods in the field of medical named entity recognition,i.e,the problems of low recognition accuracy,low computational efficiency and multi-category medical entity recognition.It has a certain positive effect on the research of medical informatics.
Keywords/Search Tags:Medical named entity recognition, Deep learning, Integrated learning, Natural language processing
PDF Full Text Request
Related items