Font Size: a A A

Research And Implementation Of Disambiguation Algorithm For Medical Records Of TCM

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2404330602464607Subject:Engineering
Abstract/Summary:PDF Full Text Request
The Traditional Chinese Medicine(TCM)medical record is the most direct evidence for the clinical diagnosis and treatment of TCM doctors.It records information such as patient symptoms and doctor diagnosis results,etc.It is of great significance to extract,analyze and uti lize the diagnosis and treatment information of TCM medical records to promote the development of TCM.It is necessary to use natural language processing(NLP)technology to mine important diagnosis and treatment information more efficiently in massive TCM medical records.Chinese word segmentation,as a key step in NLP technology,which effect of word segmentation has an important impact on Chinese medicine text information processing.However,the existence of ambiguous words in TCM text seriously influences the accuracy of TCM word segmentation and hinders the development of TCM information processing technology.In order to dissolve the combined ambiguity in TCM text and improve the accuracy of TCM text word segmentation,this paper constructs TCM text disambiguation models and TCM text word segmentation models,respectively.This study verified the effectiveness and efficiency of the model on 20,000 medical records collected from the Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine from 2017 to 2019.The main work of this article is as follows:(1)Standardize processing medical records of TCM and Analyze of the characteristics of TCM text.Firstly of all,Eliminate the medical records of TCM that lack four diagnosis information and text format disorder in the data set of medical records of TCM.Then,in according to the requirements of the "Basic Rules for the compile of Medical Records",this study correct the wrong words and modify the adulterated words in medical records of TCM.On the basis of retaining the characteristics of TCM,the symptoms in medical records of TCM are standardized.Finally,combine the language and structural characteristics of medical records of TCM to analyze the distinguishing feature and writing rules of medical records of TCM.(2)The BI-LSTM-CRF algorithm is applied to TCM text word segmentation,and a TCM text word segmentation model based on BI-LSTM-CRF is constructed.First,standardize the treatment of TCM medical records.The four diagnostic information in medical records of TCM is extracted as the training and test corpus for constructing the word segmentation model.Then,the word2 vec method is used to vectorize the experimental data.The text vector is input into the BI-LSTM neural network.The LSTM neural network layer with forward and backward directions is used to automatically learn text features and model the input text vector.Secondly,the CRF layer is used as the output layer of the model to generate the corresponding class label sequence.Finally,the word segmentation results of the TCM text are obtained.(3)A feature selection method that incorporates part-of-speech mutual information is proposed and a TCM text word disambiguation model is established.Firstly,the word frequency factor is added to the traditional mutual information to eliminate the influence of low frequency words on the mutual information value.Secondly,using the part of speech of TCM text as context feature.A feature selection method based on part of speech mutual information is established.Then,the mutual information vector is constructed using the mutual information of the word frequency and the part of speech.Finally,Building TCM text word disambiguation model by substituting mutual information vector into support vector machine.In order to verify the performance of the TCM text word segmentation model,this paper compares the TCM text word segmentation method with other word segmentation methods.The experimental results show that using the BI-LSTM-CRF TCM text word segmentation method has better segmentation performance.The accuracy of TCM text word segmentation is 93.25%.In order to verify the performance of the TCM text word disambiguation model designed in this paper,experiments are conducted from multiple angles.The results show that the feature selection method proposed in this paper has better experimental effect than other feature selection methods.The disambiguation accuracy of this experimental disambiguation model reaches 95.13%.After adding the experimental disambiguation model,the accuracy of the word segmentation based on the BI-LSTM-CRF TCM text word segmentation method reached 94.68%.
Keywords/Search Tags:Mutual Part of Speech Information, Support Vector Machine, BI-LSTM-CRF, Chinese Word Segmentation, Combined Ambiguity, Medical Records of TCM
PDF Full Text Request
Related items