In recent years,the number of visits to various hospitals has been increasing.In 2018,the total number of medical treatments in the national medical and health institutions reached 8.31 billion,and the average number of visits per person to medical and health institutions was 6.0,so that many patients could not receive timely or highquality medical treatment.This problem is particularly prominent in primary hospitals.The lack of medical resources in primary hospitals is a typical manifestation of the current imbalance between medical resources and demand.Many patients are unable to visit in time due to bed restrictions and limited medical standards,and the professional level of medical staff in primary hospitals is unbalanced.The effect is difficult to guarantee.With the continuous popularization of computers and the continuous advancement of computer technology in recent years,more and more hospitals have begun to pay attention to the records of electronic medical records and have generated a large amount of valuable medical data.At the same time,natural language processing and other machine learning models have also made great progress,which provide data and technology support for the processing and intelligent diagnosis of electronic medical records.Therefore,we mainly aimed at the construction of the diagnostic model of electronic medical records.The specific work of this paper includes several parts.The specific work of this paper includes the following parts:(1)Construction of corpus and entity recognition model: This paper first constructs a corpus of entity recognition.The corpus contains five types of entities,including symptoms,treatment,abnormal examination results,diseases,and examinations.Then,the research on the technology of entity identification was carried out,and the performance of entity recognition on different electronic medical records was compared on the basis of research.In this paper,the entity identification task is regarded as a typical task of sequence labeling,so the three entity recognition models of CRF,BERT,LSTM and LSTM-CRF are mainly compared.LSTM can memorize the context of longer sentences.CRF can well describe the correlation between sequence tags and further improve the effect of entity recognition.And the word vector is improved for the LSTM-CRF model and the optimal effect is obtained.We combine the word-based vector with the character-based vector.In this paper,we use LSTM and CNN to extract character features.LSTM mainly extracts the sequence features of characters in each word.CNN is used to extract the local features of characters in each word,that is,n-gram features.(2)The construction of the re-admission diagnosis model,there are now a large number of re-admitted patients,which greatly increased medical expenses.We proposed the construction of a re-admission risk diagnosis model,using LSTM to extract characteristics of patients during hospitalization,and adding basic patient information to diagnose whether the patient is likely to be re-admitted.We also compared the model of readmission based on CNN.(3)Construction of disease diagnosis model: After the investigation based on a large number of disease diagnosis models,the performance of different models in disease diagnosis was compared.Includes traditional machine learning models and deep learning models.In traditional machine learning,decision trees,random forests,Bayesian networks,perceptrons,k-nearest neighbors,and multi-model fusion methods are used.In deep learning,we compared three methods of deep belief network,convolutional neural network and deep neural network,and introduced different graph representation learning methods.In addition to the attention mechanism,we compare the performance of different embeddings. |