Research On New Word Discovery And Entity Recognition Of Chinese Electronic Medical Records

Posted on:2021-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:T Jiang

Full Text:PDF

GTID:2404330614960450

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

New Word Discovery and Named Entity Recognition are two important research topics in the field of data mining.New word discovery technology can recognize out of vocabulary words and improve the accuracy of Chinese word segmentation.Named entity recognition technology can accurately identify various types of named entities,which is one of the most important techniques for constructing a knowledge graph.The Chinese electronic medical records are the professional records of medical staff for the entire process of the patient’s consultation.Because the text contains a lot of real clinical medical knowledge,it has attracted the attention of scientific researchers.Using natural language processing technology to fully dig out this knowledge will greatly promote the construction of medical information.Therefore,the research work in this thesis is as follows:(1)In this thesis,we propose an improved new word discovery method.The method first performs unsupervised pre-segmentation based on the N-gram model,and then uses the word frequency,mutual information,and branch entropy as the main features to perform new word discovery.After obtaining candidate words,we combine the grid search method to obtain the optimal feature threshold combination.On the corpus of four different fields,we compare the improved new word discovery method with the method of pre-segmentation using general tools.The experimental results verify that this method has good domain adaptability.Especially for the electronic medical record corpus,the accuracy of the first 10% of new words reached 85.9%,and its effect significantly exceeded the comparison method.(2)For the problem of named entity recognition of Chinese electronic medical records,we propose an improved method.This method first uses an unsupervised new word discovery method to build a domain dictionary to improve the accuracy of Chinese word segmentation,and then uses the BI-LSTM-CRF method for named entity recognition.The experiment is performed on the electronic medical records corpus,and the results show that the F1-Measure of the entity increased by 1.46% after adding the dictionary in the medical field.(3)For the problem of fewer high-quality annotated texts in the field of electronic medical records,this thesis proposes a method for named entity recognition by combining BERT model.This method uses the BERT model to vectorize texts and uses BI-LSTM-CRF as a fine-tuning method for entity recognition.While in the experiment,this thesis compares the entity recognition results in the different training methods,different fine-tuning methods and whether further training the language model.The results show that the best method is obtained by using BERT as the language model and using the fine-tuning method of BI-LSTM-CRF.Finally,the F1-Measure of entity recognition reaches 83.39%,and further pretraining the language model can also improve the F1-Measure by about 0.54%.

Keywords/Search Tags:

Chinese electronic medical records, natural language processing, new word discovery, named entity recognition

PDF Full Text Request

Related items

1	Research On Named Entity Recognition Of Electronic Medical Records Based On BERT Model
2	Chinese Electronic Medical Record Medical Entity Recognition Algorithm
3	Named Entity Recognition Of Electronic Medical Records Based On Deep Learning
4	Research And Application Of Key Techniques For Named Entity Recognition Of Electronic Medical Records Based On Deep Neural Network
5	A Comparative Study Of Named Entity Recognition In The Recognition Of Traditional Chinese Medicine Nouns And Prescription Nouns
6	Deep Learning-based Recognition Of Named Entities In Chinese Electronic Medical Records
7	Study On Named Entity Recognition Of Chinese Electronic Medical Record Based On Deep Learning
8	Construction Of Knowledge Graph For Stroke Electronic Medical Records Based On Deep Learning
9	Research On Named Entity Recognition Of Chinese Electronic Medical Records
10	Research On Medical Knowledge Extraction In Electronic Medical Records Based On Deep Learning