Named Entities Recognition Based On Recurrent Neural Network In Biomedical Literatures

Posted on:2017-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:L K Jin

Full Text:PDF

GTID:2348330488458748

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In bio-medical field, recognizing different types of entities is the first step in a number of information extraction tasks such as relation extraction, text classification, coreference resolution and event extraction. For the current existing methods, rich domain expert knowledge and amount of artificial features are rather important in the system construction. In this paper, pre-trained word embeddings and recurrent neural networks are mainly adopted to fulfill simple and effective named entities recognition system with a series of extensions and improvements. The performance and generalization on different corpus have been greatly improved.First, based on the conventional Recurrent Neural Network (RNN), both hidden layer and output layer are added with recurrent connection. Thus, the hidden layer can maintain and record the historical information and the output layer can take advantage of probabilistic information from previous state. Besides, in order to solve the problem of incomplete information caused by subsequence division, brown clustering algorithm and Latent Dirichlet Allocation (LDA) are adopted to provide a contextual vector in associate with each word to model wider range of semantic information. Then two unidirectional RNNs with different directions are combined for bio-medical named entities recognition. And the F-score achieve 83.62% on the Biocreative II GM corpus.Second, to further improve the performance of named entity recognition and overcome vanishing gradient problem of conventional RNN when dealing with long sentence, Long Short-Term Memory (LSTM) is applied to the recurrent neural network. Then bidirectional recurrent neural network with LSTM unit is built. Considering fine-tuning process of word embedding can lead to change of pre-training word embedding which contains rich syntactic and semantics information, this paper use two different word embeddings to extend the LSTM. Besides, in terms of difference value of two kinds of word embeddings, sentence vectors can be obtained. Finally, Sentence vector/Twin word embeddings conditioned Bidirectional LSTM (ST-BLSTM) is constructed for named entity recognition. On the Biocreative II GM corpus, this framework gets an F-score of 88.61%. Compared with the top contest system which combined dictionary and multiple classifications, the F-score rises 1.40%.Above all, this paper mainly adopts two different recurrent neural networks for named entity recognition to avoid the cost brought by artificial features. And the ST-BLSTM model has a better recognition performance and generalization. Compared with the traditional RNN on the Biocreative ? GM corpus, the F-score can be improved by 4.99%; and it is also higher than single recognition system with artificial features by 1.33%.

Keywords/Search Tags:

Named Entities Recognition, Word Embedding, Recurrent Neural Network, Sentence Vector, Long Short-Term Memory

PDF Full Text Request

Related items

1	Research On Automatic Answering Technique Of English Test
2	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
3	Sentence-embedding And Similarity Via Hybrid Bidirectional-LSTM And CNN Utilizing Weighted-pooling Attention
4	Research And Application Of Named Entity Recognition Based On Bidirectional LSTM
5	Research On Named Entity Recognition Of Chinese Image Reports Based On Recurrent Neural Networks
6	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network
7	Research On Named Entity Recognition For Chinese Weibo Text
8	Research On Chinese Named Entity Recognition Based On Deep Learning
9	Sentiment Analysis Of Short Text Based On Improved Bidirectional LSTM Neural Network
10	Research And Application On Named Entity Recognition Based On LSTM