Font Size: a A A

Construction Of Zero-inflated Model And Its Application In Health Service Survey

Posted on:2017-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2334330485481375Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:In recent years,with the continuous development of medical information technology,large amounts of original data of medical information,such as epidemiological survey data,medical information of hospital,has been recorded and preserved.In addition to the rapid increase in the number of medical data,the quality and accuracy of medical data is also rising.It has become the concerned and hot issues in both the domestic and foreign statistics areas about how to appropriately choose statistical methods to tap these vast amounts of data so as to better serve health services management,hospital clinics,research and teaching and to further provide support for medical management decisions.There is often a large number of counted data in medical research,and Poisson regression is the most often used model to analyze these data.In practical study,such counted data often comprise of so much zero data,which is particularly common in epidemiological data.The so-called excessive zero phenomenon refers to that among the counted data,the number of zeros is significantly more than that in accordance with the standard discrete distributions such as Poisson distribution,binomial or negative binomial distribution.This phenomenon will cause excessive data dispersion.If we still use normal counted method to fit the model,it will result in the large parameter estimation deviation,or even the wrong inference.Therefore,based on the characteristics of such data,we can establish inflated the Poisson regression model(Zero-inflated model,ZIP)into two parts: a zero counted(zero degenerate distribution)and a non-zero counted(a value of Poisson distribution)mixed regression.Aim:In this study,in order to figure out the common problem of dataset that consisted of too much zero counted data in medical research,we construct the zero-inflated model.As for the small sample data,we introduce the Bayesian method and build the Bayesian zeroinflated model,and then compare the Bayesian zero-inflated model with traditional model in multi-conditions of different sample size and the amount of different data,and from the multi-angles such as the accuracy,precision and goodness of fit of the model to evaluate the model.We explore optimal parameter estimation model in different scenarios of data.At the same time,in order to increase the reliability of the model estimates,we introduce the Bootstrap statistical techniques.The study can provide methodological support for the future of medical statistical analysis when the zero is too much.Method:First,by setting different sample sizes as 1000,500 and 100,we simulate the original data modeling.Considering the different discrete degrees,we set a different zero proportions as 0.9,0.8,0.7 and 0.6,so as to simulate the optimal model in different data circumstances.1.Model ConstructingIn the condition of large sample,we build zero-inflated Poisson(ZIP)regression and Zero-inflated negative binomial(ZINB)regression model,which are then compared with traditional Poisson regression and negative binomial regression model.In the condition of small sample,we build Bayesian zero-inflated Poisson regression model and Bayesian zero-inflated negative binomial model,which are then compared with Bayesian Poisson regression model and Bayesian negative binomial regression model.At the same time,we introduce the Bootstrap statistical method in the process of building model,and carried out repeated sampling with replacement according to the original sample size.We sample 200 times at one time and then analyze the 200 complex samples accordingly.2.Model EvaluationThe simulation results are comprehensively,objectively and scientifically evaluated in three aspects,including accuracy,precision and model fitting,which comprise five indicators,including absolute bias,confidence interval width,standard error,confidence interval coverage and model fitting.Based on the above five indicators,we can make comprehensive evaluation for the simulated model,which may provide methodological reference for different simulation models of medical data.Case StudyThe constructed zero-inflated models are used in the practical study of health service survey.We analyze the influencing factors for chronic diseases among residents from Shanghai in the large sample analysis.We analyze the influencing factors for the number of hospitalization among residents in agriculture accounts from Pudong New Area District of Shanghai.By building different models and making statistical analyses in practical study,we achieve the goal of validating the results of simulation models.Result:The results of this study are divided into two parts,one is based on large sample and the other is based on small sample.Based on the large sample we build four models.When the sample size is 1000 and 500 and with the increase of zero proportions,on the accuracy aspect,the two traditional models’ value of absolute bias and width of the confidence interval are increasing;on the precise aspect,the two traditional models’ standard error also show an increasing trend and the confidence interval coverages are getting lower and lower.Thus,the accuracy and precision of traditional counting model simulations are not high,and hence the traditional models for simulation data of too much zero are not perfect.However,under the same condition,the simulation results of zero inflation model are much better than the traditional models.AIC values of the basic zero-inflated model are generally smaller than the basic counting model,in which the negative binomial regression is better than the Poisson regression.When zero ratio is 0.6 and 0.7,the comparison of model fit from high to low is: zero-inflated Poisson regression model,zero-inflated negative binomial regression,negative binomial regression and Poisson regression.When zero ratio is 0.8 and 0.9,zero-inflated Poisson regression and zero-inflated negative binomial regression fit near,both of which are better than the negative binomial regression,which is further than the Poisson regression.We base on the small sample to build four models.When the ratio is 0.9,we find that Bayesian Poisson model and Bayesian negative binomial model cannot be fitted,which indicates that traditional counting model is not very ideal for simulating the data containing too much zero.When the ratio is 0.6,0.7 and 0.8,from the accuracy,precision and fit aspects,zero-inflated Poisson regression and zero-inflated negative binomial regression fit near.The comparison from high to low is: Bayesian negative binomial regression,Bayesian zero-inflated Poisson regression model and Bayesian Poisson regression model.When the ratio is 0.9,Bayesian zero Inflated Poisson regression and Bayesian zero-inflated negative binomial regression fit near,both of which are better than traditional Bayesian regression model.In practical analyses,results based on large sample are similar to that of model simulation,which shows that zero-inflated model,by which we get a series of risk factors for chronic diseases,is superior to the traditional model for the excessive zero data.Results based on small sample is relatively consistent with that of large sample.ConclusionsAccording to the characteristic of zero inflated data,selecting an appropriate zeroinflated count model is superior to the traditional model,which can effectively reduce the bias.Bayesian zero-inflated model is is slightly better than traditional on when in small sample.In addition,the performance of zero inflation model in zero inflated data with hierarchy and high-dimension still needs further exploration and research.
Keywords/Search Tags:health service survey, zero-inflated phenomenon, bias, zero-inflated model, Bayesian zero-inflated model
PDF Full Text Request
Related items