Font Size: a A A

Research And Application Of Electronic Medical Record Data Extraction And Mining Technology For Cancer

Posted on:2020-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2404330590953162Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Tumor disease is not only one of the killers of human life,but also one of the difficult problems in medical history.Therefore,the research and exploration of cancer diseases is of great significance to human beings.The key to overcome the problem of cancer disease lies in early detection and treatment,thus improving the cure rate of cancer disease.With the continuous application of big data technology in the medical field,it is of great significance to use this technology to achieve assistant diagnosis of cancer diseases.The purpose of this paper is to explore the technology of data extraction and mining of electronic medical records of tumors,and to establish a medical assistant diagnosis system on the basis of the research,so as to improve the accuracy and efficiency of the system,and ultimately achieve the purpose of assisting doctors in diagnosis.This paper takes 5 G electronic medical record data of Oncology Department of a city's third-class hospital as the research object,carries out data mining and analysis experiments,and focuses on two stages which have an important impact on the mining results: data extraction and mining experiment exploration stage,and designs and develops a medical assistant diagnosis system based on the research results.The research contents are as follows:(1)In the process of data extraction,the reverse maximum matching Chinese word segmentation algorithm with high comprehensive performance is selected and improved for Chinese word segmentation technology,which improves the accuracy and efficiency of word segmentation for cancer data and lays a good data foundation for entity extraction stage.(2)For entity recognition technology,a Chinese electronic medical record named entity recognition method based on multi-feature fusion of conditional random field is adopted.Five features,including word feature,part-of-speech feature and medical terminology lexicon feature,are fused in turn for medical entity recognition.Sevenself-defined external semantic lexicons are established as part of feature support.Experiments show that the selected features are feasible and effective,and the fused features can effectively improve the accuracy of entity recognition.(3)In the process of exploring mining experiments,firstly,C4.5 and BP neural network algorithms with better classification effect are selected to carry out classification mining experiments;secondly,in view of the shortcomings of the experiments,rough set algorithm is used to reduce attributes;finally,classification mining experiments of C4.5 and BP neural network are carried out again.Through the above four groups of experiments,the results are as follows:1)The performances of the two algorithms are compared through experiments.2)The validity of attribute reduction is verified.3)It is found that the reduction operation is more advantageous to C4.5,and it is concluded that the C4.5 algorithm after attribute reduction is more suitable for the classification mining experiment of electronic medical records of tumors.(4)Designing and developing the medical assistant diagnosis system,and embedding the related results of data extraction and mining technology into the system,finally forming an automatic medical assistant diagnosis system relying on big data technology.In a word,in view of the characteristics of medical cancer data,this paper explores a more suitable classification mining method for electronic medical records of tumors by optimizing and improving each link of data mining process,and designs and develops a medical assistant diagnosis system on the basis of this research.The system can assist doctors in the diagnosis of cancer diseases,help patients find cancer diseases as soon as possible,and improve the cure rate of cancer patients.
Keywords/Search Tags:electronic medical record, classification mining, entity recognition, Chinese word segmentation, attribute reduction, C4.5 algorithm
PDF Full Text Request
Related items