Font Size: a A A

Research On Disease Identification Based On Chinese Word Embedding Technology

Posted on:2022-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y P GaoFull Text:PDF
GTID:2504306557468804Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of deep learning,word embedding technology based on distributed word representation can make use of the internal semantics of corpus to effectively improve the performance of neural network model,and has been widely applied in disease recognition research.Aiming at the problem that the existing word embedding technology based on word distributed representation cannot capture the internal semantics of Chinese,this paper studies the "one-hot coding",Word2 Vec model,and further studies deep learning technologies such as the feedforward neural network,recursive neural network,convolutional neural network,attention mechanism on this basis.The main contents and innovations are summarized as follow:(1)In order to solve the problem that the existing word segmentation technology based on Latin language cannot meet the Chinese characteristics,a Strokes Method of "Zha" is proposed based on the in-depth study of the neural network language model based on Latin language,Word2 Vec model,Character-enhanced Word Embedding model and the characteristics of Chinese word formation.Compared with the word segmentation technology used in Character-enhanced Word Embedding model,the proposed technology can extract the semantic information of Chinese characters better.(2)In order to fully capture the internal semantic information of Chinese,a Chinese Word Embedding Model Based on Strokes Method of "Zha" is proposed.The experimental results show that,in word similarity task,compared with the Continues Bag of Words model,the F1-Score of the proposed model is improved by 3.4% and 8.3% in wordsim-240 and wordsim-296,respectively.Word Analogy Task,compared with the Continues Bag of Words model,the 3COSADD and 3COSMUL of the proposed model are improved by 23.8% and 26.4%,respectively.The proposed model can better capture the internal semantic information of Chinese.(3)In order to achieve better recognition effect,the Bi-directional Long Short-Term Memory-Conditional Random Field model is constructed on the basis of Chinese Word Embedding Based on Strokes Method of "Zha",and the experiment is carried out on the collected disease diagnosis data set.Experimental results show that the Precision,recall and F1-Score of the proposed model are improved by 4.6%,3.4% and 3.9%,respectively,compared with the Bi-directional Long Short-Term Memory-Conditional Random Field model under the optimized parameters.The proposed model can make use of Chinese internal semantic information to improve the accuracy of recognition and reduce the errors caused by word order,so it is more suitable for disease recognition in Chinese domain.
Keywords/Search Tags:Chinese Word Embedding Technology, Auxiliary Diagnosis and Treatment System, Disease Identification, BiLSTM, CRF
PDF Full Text Request
Related items