Font Size: a A A

Research On Chinese Named Entity Recognition Based On Deep Learning

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:N ChangFull Text:PDF
GTID:2518306107489704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology,the Internet has gradually become a carrier of more and more information.Network platforms generate massive amounts of data every day.How to efficiently process and use these text data is a hot topic of current research.Named entity recognition technology refers to extracting specified types of entities from the text and classifying them.And common entity types include names of person,location,organization,and so on.Entity recognition can be modeled as a sequence labeling task,which is the basic work of natural language processing tasks.It is of great significance for subsequent relationship extraction,machine translation,question answering systems and other tasks.Due to the particularity of Chinese named entity recognition,there are still many unsolved problems,such as entity boundary segmentation,out-of-vocabulary word recognition,etc.To solve these problems,this thesis proposed a fusion model of character and word feature based on self-attention mechanism,which combines the character representation and word segmentation features,on the basis of the traditional Bi-RNN+CRF.This model is based on character granularity,and solves the problem of unregistered words,and combines the structural features of segmentation to repair the physical boundaries.The self-attention mechanism is used to dynamically calculate the similarity with the hidden states of two granularities to train the dependence between any character and word segmentation.In addition,this thesis also proposed a Chinese entity recognition model based on global information with a multiple granularity features of characters and words.The Bi-LSTM is used to capture the context information on forward and reverse directions in the sequence,and use the self-attention mechanism to perform global matrix calculation directly with the words at each position during the training process,to increase the feature dimension.To solve the problem of polysemous vector representation in Chinese,a BERT pre-training model is introduced to train the current position bidirectionally with context,and then the same word can be mapped into different vectors according to different contextual semantics,which improves the accuracy of input sentence representation.Order information is also extremely important in sequence tagging tasks.To address the shortcomings of position insensitivity in self-attention mechanism,the position feature input is added to BERT,and the relative position between words can be calculated by trigonometric function,which improves accuracy of named entity recognition.The models proposed in this thesis has designed ablation experiments on two public datasets,and compared with the classic models.The experimental results show that the models have significantly improved the performance on both data sets.
Keywords/Search Tags:Chinese Named Entity Recognition, Self-attention Mechanism, Multi-granularity Feature Fusion, Global Context-aware Information
PDF Full Text Request
Related items