Font Size: a A A

Research On Chinese Named Entity Recognition Based On BERT

Posted on:2024-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2568307136994899Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,there is a huge amount of text data on web browsers every day.Within these textual information,there is a lot of meaningful information,so it is particularly important to effectively extract the meaningful information from them.Named entity recognition is an important technique for data extraction,so improving the effectiveness of model entity recognition has become a research hotspot in recent years.Although named entity recognition(NER)research has made significant progress in recent years,there are still some issues to be addressed.NER cannot handle Chinese polysemous words well and cannot fully extract the semantic information of text context.In addition,there is a lack of highquality annotated datasets in specific fields,such as the financial domain,and therefore,the application of NER in financial news scenarios is limited.In light of these issues,this thesis proposes improvements to existing recognition models and applies the improved algorithm to financial news texts.The specific work content is as follows:1)this thesis addresses the issue of polysemous words in Chinese.Traditional embedding models such as Word2 vec cannot capture the varying meanings of words indifferent text contexts,and therefore cannot represent polysemous words well.To solve this problem,BERT is used as the word embedding model,and a method for merging the multi-layer network outputs of BERT is proposed based on the analysis of the BERT network structure.The different encoding layers of BERT extract semantic information from text data at different levels,so this thesis proposes an FBERT-Bi LSTMCRF model that merges the output of each BERT network layer.This model has stronger feature extraction ability and can better process text data,thus improving the accuracy and efficiency of entity recognition.Experimental results show that the F1 score of the improved model has been improved,compared with the BERT-Bi LSTM-CRF model,the accuracy rate,recall rate and F1 value of the model on the People’s Daily dataset have increased by 1.1%,0.54% and 0.81%,respectively.respectively,which proves the superiority of the model in entity recognition tasks.2)a BERT-BLIA-CRF model based on dual-channel feature extraction is constructed.This model uses the Bi LSTM network to extract long-distance dependency information and the IDCNN network to extract local important features.After that,the attention mechanism layer is applied to assign different weights to the features based on their importance.This model can effectively extract contextual and local features from text,thereby improving entity recognition efficiency.Finally,experiments were conducted on publicly available datasets to verify the practicality of the model,compared with the BERT-Bi LSTM-CRF model,the accuracy rate,recall rate and F1 value of the model on the People’s Daily dataset have increased by 0.97%,0.26% and 0.61%,respectively.3)a financial annotation dataset and a financial named entity recognition system are constructed.In this thesis,a certain amount of financial news data was crawled and four types of entities were set.The data was preprocessed and labeled in "BIO" format to construct a financial text annotation dataset.After that,experiments were conducted on the financial dataset,and the results showed that the proposed model performed better than other models in the financial field.Additionally,to meet the user’s need to understand the related entities involved in financial news,a Chinese financial news NER system was built to facilitate users to quickly and accurately obtain relevant entity information.
Keywords/Search Tags:Named entity recognition, BERT, Fusion Network Layer, Dual channel extraction, Financial field
PDF Full Text Request
Related items