Font Size: a A A

Research On The Risk Assessment Of Diseases Based On Multiple Imputation And Generalized Estimating Equation

Posted on:2018-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhaoFull Text:PDF
GTID:2334330542485797Subject:Social Medicine and Health Management
Abstract/Summary:PDF Full Text Request
ObjectiveThe purpose of this study was to solve the problems of data missing and repeated measurements by multiple imputation and generalized estimating equation when assessing the risk factor for diseases with elderly health management archives data.A simulation experiment was conducted to examine that parameter estimation results would be more accurate after missing data filled up.This research would provide an analytic method to make use of health management archives data and provide scientific reference for disease prevention and control.MethodsThis research gathered the elderly health management archives data from The 3rd People's Hospital of Xiangcheng District from 2011 to 2015(except 2013).Two kinds common disease in life were selected to be the research examples.They were hyperglycemia(numerical variables)and gallbladder stones(classification variables).Multiple imputation was used to solve the data missing problem and generalized estimating equation was used to solve the repeated measurements problem.Finally,comprehensive results would be calculated.A simulation experiment was conducted.Different parameters estimation would be estimated from 1000 unfilled data sets and 1000 filled data sets.Then this study compared them with the parameters estimation which estimated from the complete data set in order to know which method was more accurate.The occurrence probability of type I error and type II error when using different data sets would also be compared.Results1.Characteristics distribution of research dataThis study gathered 8325 different elderly health management archives data and accumulated 23195 observational data.In the object,the minimum age was always 57 in all years.The maximum age was 96 in 2014.The population of female exceeded the population of male each year.In the four times collection.1668 people were gathered once,1452 people were gathered twice,2197 people were gathered thrice,3008 people were gathered four times.2.Situation of multiple imputationIn all contained indicators,there were only variables as gender and age without a missing.The rest of variables all faced a missing problem.The ranges of miss rate were 0.06%-18.44% and the actual effective rate of the sample was 76.99%.Regardless of assessing the risk factors of hyperglycemia or gallbladder stones,all the data sets had an arbitrary missing pattern,10 different complete data sets were produced after using multiple imputation.All variables' relative efficiency was above 0.97.3.Situation of assessing disease risk factorsHyperglycemia multi-factor comprehensive inference showed that the OR values(95% CI)of high blood pressure,overweight,obesity,a racing heart,hypercholesterolemia,combined hyperlipidemia,low density lipoprotein cholesterol,high blood uric acid,high alanine transaminase,high aspartate transaminase were 1.27(1.20-1.36),1.25(1.16-1.34),1.65(1.48-1.85),1.68(1.58-1.79),1.18(1.06-1.31),1.17(1.02-1.33),1.11(1.04-1.17),1.17(1.09-1.27),1.18(1.03-1.35),1.19(1.03-1.39).Gallbladder stones multi-factor comprehensive inference showed that the OR values(95% CI)of overweight,obesity,hypertriglyceridemia,low HDL levels,high serum creatinine,high aspartate transaminase,hyperglycemia,old age,female were 1.14(1.08-1.21),1.27(1.16-1.38),1.12(1.06-1.19),1.17(1.05-1.31),1.20(1.03-1.39),1.12(1.02-1.22),1.09(1.04-1.15),1.37(1.29-1.45),1.85(1.65-2.06).4.Comparison of different parameters estimationWhen compared different risk models of hyperglycemia,the statistical significance of combined hyperlipidemia and high aspartate transaminase changed.When compared different risk models of gallbladder stones,the statistical significance of low density lipoprotein cholesterol and high serum creatinine changed.The ? and r showed that missing data had the greatest impact on the parameter estimation of body mass index.5.Results of simulation experimentIn the 1000 repeated process,the variables of anemia,hypertriglyceridemia,combined hyperlipidemia,low HDL levels,high aspartate transaminase occurred type I error in filled model or unfilled model.The frequency of different models was 0 vs 1,0 vs 5,300 vs 388,0 vs 5,56 vs 244.The occurrence of type II error was also reduced,such as variable hypercholesterolemia the frequency reduced from 163 to 11.After filling up,the coefficients of gender,age,high blood pressure,a racing heart,anemia,hypertriglyceridemia,hypercholesterolemia,combined hyperlipidemia,low density lipoprotein cholesterol,high serum creatinine,high blood uric acid,alanine high alanine transaminase were more close to coefficients which calculated from complete data set.ConclusionsThe problems of data missing and repeated measurements can be solved effectively by multiple imputation and generalized estimating equation when assessing the risk factor for diseases with elderly health management archives data.This could provide an analytical method to make use of elderly health management archives data and provide an idea to the data analysis which may meet the same problems.When the departments of public health perform the health management in the elderly.Related factors which may cause the hyperglycemia and gallbladder stones should be strengthened the surveillance.Health education should be enhanced.Unhealthy lifestyles should be modified.Then the occurrence of related diseases can be controlled.
Keywords/Search Tags:Health Management Archives, Longitudinal Data, Multiple Imputation, Generalized Estimating Equation
PDF Full Text Request
Related items