Research On Multi-feature Ambiguity Resolution Method For Traditional Chinese Medicine Text Segmentation

Posted on:2022-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:F Y Yu

Full Text:PDF

GTID:2504306335972989

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Traditional Chinese Medicine(TCM)medical record is an important carrier for the inheritance and development of traditional Chinese medicine,which records information such as the diagnosis of patients’ diseases and the rule of TCM.The text that records information is called TCM text.It is of great significance to explore and utilize the effective information contained in TCM text for the smooth progress of follow-up research and promoting the development of TCM.In order to efficiently explore the effective information in TCM text,researchers need to use natural language processing technology to process TCM text.Word segmentation is a key step in the process,and the accuracy of the results will have a certain impact on subsequent experiments.The phenomenon of divergence in word segmentation is the main reason that affects the accuracy of word segmentation in TCM text.This thesis constructs the TCM text segmentation model and the TCM text multi-feature ambiguity resolution model respectively in order to resolve the segmentation ambiguity in the process of TCM text segmentation and improve the precision of TCM text segmentation.The main work of this thesis are as follows:(1)Firstly,the medical records of TCM are collected and sorted.A total of 20000 medical records collected from the Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine from 2005 to 2020 are selected as the dataset,and the content of the TCM text medical records is summarized and analyzed;Secondly,the TCM medicine data and TCM symptoms of TCM text is standardized;Thirdly,the causes and classification of ambiguous fields are analyzed,and the difficulties in resolving ambiguous fields are summarized.Finally,multi-feature of ambiguity resolution are select by analyzing the features of TCM texts.(2)A TCM text segmentation model based on Bi-GRU is proposed.Firstly,the TCM text is annotated with four-digit BMES(B represents the first character,M represents the middle character,E represents the last character,and S represents a single word);After the annotation is completed,the text is vectorized by Word2 vec method to obtain the text vector.Secondly,the text vector is used as the input of Bi-GRU neural network,and the information in forward and backward directions is obtained,and the possible labels of each word are obtained.Finally,the label sequence with the highest probability is selected as the final word segmentation result by Viterbi algorithm.(3)A multi-feature ambiguity resolution model for TCM text is proposed.Based on the combined ambiguity in the disagreement,the TF-IDF algorithm with added word length is used to calculate the weight features of weight generation,and the contextual word features and part-of-speech features within the text window where ambiguous fields are located is extracted according to the characteristics of TCM text language,including as concise,fuzzy and unstructured.The weight feature,context word feature and part-of-speech feature are combined into multi-feature and to from the feature vector,which are input into nonlinear support vector machine to construct a "combination" classifier and a "division" to obtain ambiguous segmentation results.Comparative experiments are carried out to verify the performance of the TCM text segmentation model and the TCM text multi-feature ambiguity resolution model designed in this thesis.The experimental results show that the accuracy of the word segmentation method in this thesis reaches 93.26% and the accuracy of segmentation words after ambiguity resolution reaches 94.75%,which indicate that the methods proposed by this thesis are feasible and effective.

Keywords/Search Tags:

Bi-GRU, multi-feature, support vector machine, combined ambiguity field, TCM text

PDF Full Text Request

Related items

1	Research On Medical Image Mining Based On Improved Multi Kernel Support Vector Machine
2	Research On Encephalic Tissue Recognition For MR Image Based On Support Vector Machine
3	Research On Brain Image Analysis Based On Sparse Structural Feature Learning And Their Applications
4	Support Vector Machine And Its Application In Electrocardiogram Classification
5	The Research On J Wave Diagnosis Techniques Based On Support Vector Machine
6	Application Of Support Vector Machine In Prediction Of Diabetes Genetic Risk
7	Application Of Multi-label Support Vector Machine In X-RAY Lung Disease Detection
8	Research On Risk Prediction Of Diabetes Based On Random Forest And Support Vector
9	Prediction Research Of Pancreatic Cancer Markers Based On Support Vector Machine
10	Cancer Diagnosis By Using Support Vector Machine