Font Size: a A A

Research On The Construction Of Disease Prediction Model Based On Electronic Medical Record Data

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhengFull Text:PDF
GTID:2514306566491214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
This paper starts from the perspective of electronic medical record data,aims to build a diabetes risk prediction model for the purpose of exploring the method of disease prediction modeling based on a numerous electronic medical record data,so as to provide a basis for analyzing and predicting diabetes risk based on electronic medical record data.It provides guidance for the prediction of fasting blood glucose index in the next year,the diagnosis of diabetes and the control of blood glucose level.The model can predict the risk of diabetes in the coming year for physical examinee by physical examination data.This paper mainly uses the big data research method to predict the risk of diabetes,and explores the feature selection algorithm and modeling algorithm,which are more suitable for predicting the risk model of diabetes based on physical examination data.The work is as follows: firstly,the electronic medical record data required for the research is prepared,which is from a physical examination institution.After pretreated the original data,7118 pieces of data were obtained,which meet the experimental requirements and contain 139 data items.In the process of data dimension reduction are selected sequence backward selection,principal component analysis(PCA)and literature method as feature selection algorithms.Algorithm results: the features selected by literature method are the least,the number of features selected by sequence backward selection algorithm and PCA is the same,and the features selected by sequence backward selection algorithm and literature method overlap more.Then,the three selected datasets are applied to the modeling process;In the construction stage of prediction model,the three data sets are modeled based on decision tree,random forest,support vector machine,logistic regression and Naive Bayes.Finally,all the models are cross validated based on multiple indicators,and the results show that the energy efficiency of the data set obtained by the sequential backward selection algorithm and the literature method is similar in each modeling algorithm,which is better than the prediction model based on data set selected based on PCA.The comprehensive evaluation of Naive Bayes algorithm is the lowest;Support vector machine and logistic regression based on the three feature selection algorithms have high accuracy,but poor performance in specificity and AUC;Using the literature method and sequence backward selection algorithm,the diabetes prediction model based on decision tree and random forest algorithm has better comprehensive evaluation.In the process of establishing a diabetes risk prediction model using three year physical examination data,the sequential backward algorithm and the literature method are used to extract the features.Based on the decision tree and random forest construction model,the risk of diabetes can be predicted better.
Keywords/Search Tags:physical examination data, fasting blood glucose, machine learning algorithm, diabetes risk prediction
PDF Full Text Request
Related items