With the development of computer application in application in healthcare,electronic medical records have gradually become the key way for doctors to record the situations of the patients when they entering hospital,be treated and discharged from hospital.The special role of electronic medical records determines that they must contain abundant medical entities and complex relationships of entities.The entities and the relationships between them are the basis for building medical knowledge.Therefore,machine learning algorithm is proposed for named entity recognition and relationship extraction from Chinese electronic medical records in this thesis.For entity recognition of Chinese electronic medical record,a algorithm framework based on convolutional neural network and conditional random field(CNN-CRF)is designed in this thesis.To get high-level word embeddings,we add the labeled entities to dictionary to help segment word,and regard data set as corpus,then unsupervised train the words embeddings using word2 vec.The increase of the dilated layers will cause over-fitting,so,the iterative dilated convolution is applied to our experiment,and it’s some connections are randomly discarded by the dropout.Finally,the conditional random field revise the classification result.Based on the Chinese electronic medical records,the proposed method can extract body parts,disease,symptom,examination and treatment from the records.The experimental result indicates that the raised approach is able to effectively identify the entities,and the accuracy rate,the recall rate and and the F-Measure are 90.01%,90.62% and 90.31%.Compared with the traditional methods,the approach’s accuracy and speed are improved.In terms of the relationship extraction task,A labeling strategy combined with ID-CNN+CRF is adopted.The relation extraction problem is simply viewed as a sequence labeling task,and the labeling strategy,normalization,branching,and vectorization are applied to process experimental data in this experiment.According to the distribution of experimental data,only the relationships between body parts,symptoms,and inspections are extracted.The experimental results present that compared with other comparison algorithms,the solution we put forward has better performance.In the case of a small amount of experimental data,the average accuracy,average recall rate and average F-Measure are 63.93%,63.13%,and 63.40%.On the one hand,the category of relationship is less that has brought the good result,on the other hand,this experimental result explain that the scheme is suitable for solving the relationship extraction. |