| The electronic medical record can be regarded as the patient’s recorded information from the time of entering the hospital to the time of leaving the hospital,including the cause of the disease,the treatment method of the disease,the medical treatment of the disease and the inspection measures of the disease,etc.This information is not only beneficial to the recovery of patients’ health,but also leaves valuable information for future medical treatment.Electronic medical records generally have image information and text information,of which text information is the most common.Most of the electronic medical records written in hospital records are unstructured information,which is of great significance to scientific research and clinical practice.However,this information is not easy to be extracted by information.The most basic and critical part of text information extraction is named entity recognition.Currently,named entity recognition is relatively mature in the general field,but it is not effective in the medical field.Therefore,the perfect integration of named entity recognition into the medical field is the key technology for the structure of Chinese electronic medical record information.Finally,the content of this paper is divided into the following points:1)Using embedded Viterbi algorithm to improve Chinese word segmentation method based on statistics.Based on the statistical Chinese word segmentation method,a dictionary of exclusive Chinese electronic medical records collected and sorted is loaded,so that when the Chinese electronic medical record is segmented,there will be no misclassification or omission of medical proper nouns.And when statistically calculating the optimal word segmentation sequence,the Viterbi algorithm is embedded,so that each step of the word segmentation is an optimal word segmentation path,which reduces the amount of calculation and saves time for subsequent research work and data labeling.2)Data cleaning and improved labeling methods.The Chinese electronic medical record text content is filtered and data cleansed,and invalid text data content is eliminated.After Chinese word segmentation,a batch of Chinese electronic medical records were marked by BIO notation,and after the marking was completed,functions were written in python language,so that the annotation was changed from BIO to BIOES.At the same time,an exclusive Chinese electronic medical record dictionary was sorted out by searching related medical vocabulary from existing medical record text data and the Internet.3)Design a Chinese electronic medical record named entity recognition algorithm.The first part of the algorithm is a convolutional neural network.The data is extracted by the convolution operation in the convolutional neural network.After the feature extraction is completed,it is input to the second part of the algorithm.Extraction of contextual features.Finally,the conditional random field input to the third part of the algorithm is used for entity recognition.4)Improve the convolutional neural network based on the Inception structure and the Resnet structure.The network structure of the improved convolutional neural network adopts parallel and series connection for neural network learning and feature extraction.Such a network structure not only reduces the number of parameters under the same convolution effect,but also reduces the calculation amount of the network accordingly.5)Study the BERT language model to generate word vectors.Use the language model to pre-train the generated word vector to replace the network randomly initialized vector.The use of the word vector greatly enriches the features such as the semantics and position of each word in the text,which can make the model better in the process of deep learning model training Faster convergence to achieve the desired effect. |