Font Size: a A A

Research On The Construction Of Disease Forecasting Model Based On Electronic Medical Record Data

Posted on:2018-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2348330515975994Subject:Medical informatics
Abstract/Summary:PDF Full Text Request
Objectives:This study does research from the perspective of medical informatics,aims to construct the strategy of disease prediction model,so as to explore the discovery treatment from implicit knowledge to explicit data of huge amounts heterogeneous electronic medical record data.At the same time,the empirical study on the construction of predictive model of diabetic retinopathy is carried out to demonstrate the scientificity,rationality,operability and extensibility of the theoretical strategy,and provide decision support for disease prevention,diagnosis,control and treatment.Methods:First of all,through the literature research to summarize the research situation in various fields at home and abroad,then regard knowledge discovery,information chain,decision support theory as guides to explore the disease construction strategy of prediction model.Then,using one of the national science and technology resources platforms,population health sub-platform,which provides medical data sets of diabetes patients for empirical research.In the process of empirical research,for data preprocessing analyzed the missing data,and the layered mean fill method is used to classify and fill the missing data in the target data set.Using three methods for dimension reduction respectively,the principal component analysis method to extract the factor with the eigenvalue greater than 1,extract the factors which cumulative contribution rate greater than 85%,using logistic regression method to extract factors with significant differences,to select the feature vectors.In the phase of building the forecast model,first,adjusted the data set and determined the baseline precision.Then,constructing predictive model with decision tree algorithm,logistic regression algorithm,support vector machine,naive Bayesian and radial basis function neural network algorithm are used to construct the control experiment model.Finally,on the basis of the accuracy rate,recall rate,correct rate,F value,area under the ROC curve,Kappa value to evaluate the effect of the forecast model synthetically.Results:(1)Under the guidance of theories of knowledge discovery,information link and decision support,developed a model of disease prediction based on medical data from the aspects of data integration and cleaning,data filling and dimension reduction,model construction and evaluation;(2)At the missing data filling stage,made the missing data fill scheme,first divided target data into many subsets according to gender,age groups and whether has diseases,through the X2 test,it was found that there was a significant difference in the prevalence of the target data set in different sexes and in different age groups.Therefore,classify the data according to gender,age,whether have disease,and then use the layered mean fill method to fill;(3)In the dimension reduction phase,using the same method to analyze three kinds of dimensionality reduction data sets and non-dimensioned original data sets,the variance analysis showed that the differences among the four results were significant,and the first dimensionality reduction method is more accurate and the prediction effect is better;(4)Forecasting model construction and evaluation stage,the SMOTE method is used to balance the unbalanced dimensionless data,and determining the baseline prediction accuracy of the model is 71.9166%.The data set processed by the first dimension reduction method is taken as the research object,and the prediction model is constructed by using the decision tree algorithm,get the true positive rate(TP)of the model was 0.975,the false positive rate(FP)was 0.045,the precision was 0.974,the recall was 0.975,the F value was 0.974,the area under the ROC curve(ROC Area)of 0.975 and a consistency test(Kappa)of 0.936;(5)The controlled trial model evaluation stage,selection of logistic regression,support vector machine,naive bayesian,radial basis function neural network to construct controlled trial modeles,analysis by variance,the results of the controlled models and decision tree were significantly different,and then through multiple comparisons between two groups,get the decision tree algorithm is better.Conclusions:(1)In the phase of theoretical research,the strategy of disease prediction model was established.The strategy built an effective data filling scheme,selected the optimal data dimension reduction method,can be flexible and efficient guide the data mining process of massive heterogeneous electronic medical record data,the specific links include: data integration and integration,cleaning and standardization,missing value processing,data filtering and dimensionality reduction,data balance,model building and evaluation process.(2)The results of empirical research fit the process and principle of theoretical strategy,the prediction model was validated by a series of index evaluation and control experiment model,and get the most effective prediction model.It was proved that the strategy of disease prediction model based on electronic medical record data is scientific,reasonable and effective.The strategy can provide reference for knowledge discovery,integration and decision support of medical information.
Keywords/Search Tags:Data mining, Knowledge discovery, Diabetes, Retinopathy, Prediction model
PDF Full Text Request
Related items