Font Size: a A A

Research On Data Mining Method Of Diabetes Risk Based On Electronic Medical Record Analysis

Posted on:2017-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:W X XiaoFull Text:PDF
GTID:2354330503486335Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Fasting blood glucose(FBG) is an important indicator for human's health. Prediction for FBG is meaningful for finding and healing diseases, especially for diabetes mellitus.Experimental data are collected from a medical examination database,and contains rich information for a long period of healthy records. From the view of big data analysis, this paper focuses on studying the relationships of a lot of medical examination items and FBG's changes. Two models are proposed in this paper: based on four years' historical medical examination data, a prediction model of coming year's FBG is presented using traditional data mining techniques with a novel algorithm to estimate the FBG change probability and a proposed feature selection algorithm, which combines the feature importance scores of ensemble learning and Sequential Forward Selection(SFS) algorithm to select an optimal feature subset. The medical examination data were used to carry out the experiment, which showed that after the feature selection, a set of medical examination items highly related with FBG was obtained with the fasting blood glucose, and the performance of the model was improved. Testing on the medical examination data with FBG change prediction model, the population with high risk of FBG change was analyzed from sensitivity,specificity and positive predictive value, which shown for FBG rising population the model had good results. Secondly, for the population with fasting blood glucose, the relationship between medical examination items and the population with high FBG risk to establish an early prediction model of the risk of FBG. The model considered the impact of any two medical examination items for changes in FBG, then made the cross terms in original data set. Based the important medical examination item scores and feature selection, the important medical examination cross term on the FBG were found. From the cross terms set, an important set of cross terms were selected by the feature selection method and used to make the FBG prediction model. The results produced by the medical examination data and random forest showed that the performance could improve by feature selection, Because of the extreme imbalance between the positive and negative samples in the data set, the model designs a method based on under sampling and model combination to deal with the problem of imbalanced data classification, and has achieved good results...
Keywords/Search Tags:fasting blood glucose, data mining, feature selection, the risk of FBG up prediction, unbalanced classification
PDF Full Text Request
Related items