In the medical field,text classification tasks based on Natural Language Processing(NLP)can help medical workers manage and analyze medical data,and even assist in diagnosis and treatment,which has a wide range of practical application prospects and medical clinical value.However,most of the existing studies focus on English electronic medical records,and there are fewer studies on Chinese electronic medical records.Most of the current Chinese electronic medical records are unstructured texts,and problems of low utilization,more symbols and irregular writing lead to poor applicability of traditional classification models and affect the accuracy of medical text classification.In this paper,a Capsule networks model for electronic medical record classification is proposed,which combines existing neural network models and relies on a unique routing structure to extract complex Chinese medical text features.To a certain extent,it solves some problems and shortcomings in Chinese medical text classification and provides a new research idea for the task of Chinese medical text classification.The specific research contents are as follows:(1)To address the difficulties of traditional neural network models in considering the front and back text information of Chinese medical texts and the lack of long-distance dependency on texts,we proposed a fusion model based on Recurrent Neural Networks and Capsule networks.The model takes word vector as input,feature extraction by a bidirectional Recurrent Neural Network,splicing positive-order features with inverse-order features,and finally,the Capsule networks calculate the relationship between low-level features and highlevel features and outputs classification results.The experiment result shows a better result of this fusion model than other comparable models,and the F1 value is improved by 5.46% on the basis of the original model.However,influenced by the Word2 Vec method,it fails to consider the intrinsic connection between words,which leads to the limited effect of the model.(2)For the problems of irregularity of medical text and weak degree of word-word and word-sentence association,we proposed a fusion model of Graph Convolutional Networks and Capsule networks.The model is constructed by word frequency and word co-occurrence frequency,and the weights between nodes are updated by the GCN layer,and the attributes of nodes and graphs are effectively preserved and classified by the Capsule networks.The experimental results show that the model improves the F1 value by at least 4.67% over other baseline models in the Chinese medical text classification task.(3)To address the problem of capturing fewer attributes of text features in the two previous models,a model of fusion of BERT and Capsule networks is proposed,which incorporates symbols into the calculation of word embedding as well by encoding symbols in the text.And by paying more attention to the features at different locations through the multihead attention mechanism,so as to obtain contextual information,which can make the words have a more accurate representation in the current text.The model effectively solves the problems of medical text with many symbols and multiple meanings of words.According to the experimental results,the fusion model significantly outperforms other models with an F1 value of 82.3%,which is a large improvement over other methods. |