Research On Named Entity Recognition Method Of Low-resource Chinese Medical Text

Posted on:2023-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z Gan

Full Text:PDF

GTID:2544306794490844

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The research on Named Entity Recognition(NER)in Chinese medical text is of great significance to medical information extraction.However,it is difficult to obtain labeled data in the medical field,so the development of Chinese medical text NER has been limited by the problem of low resources.Low resource means lack of labeled data,which can seriously affect the performance and generalization of the model.To cope with the lack of labeled data in low-resource scenarios,this paper proposes two Chinese medical named entity recognition methods.The main work is as follows.(1)Incorporate lexicon into self-training: a distantly supervised Chinese medical NER(LSCNER)method is proposed.Firstly,a self-training-based entity high recall method is proposed to effectively recall potential unlabeled entities;secondly,a scoring and ranking method based on fine-grained dictionary enhancement is proposed to model the unique internal structure of medical entities,The recalled entities can be screened,and the false entities obtained by the high-recall method of entities can be effectively reduced.In addition,this paper constructs a Chinese medical NER dataset CDD.The experimental results show that on the dataset CDD constructed in this paper and the public dataset CCKS 2019,compared with the baseline model,the LSCNER improves the F1 by 3.20% and 5.03%,respectively.(2)Enhance both Text and Label: A Chinese medical NER(TLCNER)method is proposed.The method utilizes pre-trained language models and semisupervised learning to optimize from both text and label dimensions.First,a text-enhanced Chinese medical NER method based on a pre-trained language model is proposed.This paper searches 200,000 medical texts from the Internet,and continues to pre-train the public pre-training model for medical field adaptation.The texts of the two public datasets are augmented by data,and continue to be pre-trained for task adaptation.Secondly,a semi-supervised label-enhanced Chinese medical NER method is proposed.In this paper,the semi-supervised learning method is used to process unlabeled data,obtain pseudo-labeled data,and add pseudo-labeled data to the original training data to improve the data labeling.diversity.Finally,on two low-resource public datasets,the TLCNER improves the F1 by 2.68% and 3.66%,respectively,compared to the BERT-base model.

Keywords/Search Tags:

Low-resource, Chinese Medical Information Processing, Named Entity Recognition, Distantly Supervised, Pre-trained Language Model

PDF Full Text Request

Related items

1	Research On Medical Entity Recognition And Event Extraction For Chinese Electronic Medical Records
2	Research On Named Entity Recognition Method For Traditional Chinese Medicine
3	Research On Named Entity Recognition Method For Chinese Medical Data
4	Chinese Medicine Named Entity Recognition
5	Research On Method Of Medical Named Entity Recognition Based On Pre-trained Model
6	Research On Named Entity Recognition Of Electronic Medical Records Based On BERT Model
7	Study On Named Entity Recognition Of Chinese Electronic Medical Record Based On Deep Learning
8	A Comparative Study Of Named Entity Recognition In The Recognition Of Traditional Chinese Medicine Nouns And Prescription Nouns
9	Medical Text Information Extraction Based On Deep Learning
10	Research On Medical Named Entity Recognition Based On XLNet-CRF