Font Size: a A A

The Statistical Inference For Finite Mixture Count Data Model With Missing Data

Posted on:2016-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:P X ChuFull Text:PDF
GTID:2297330470968035Subject:Quality statistics
Abstract/Summary:PDF Full Text Request
Count data is a kind of common discrete data, its value is only non negative integer, like 0,1,2,... etc., and it is described as frequency of occurrence in a unit of time or in the space, such the number of defective products, the number of defects, the number of traffic accidents, the number of register in hospital and the number of forest fires. It widely exists in many fields of finance and insurance, medicine, genetics, clinical diagnosis and psychology etc.. Because of characteristic, Poisson regression model and negative binomial model are the the most common data analysis models. Poisson regression model is the basic model for fitting count data, has been widely applied to different fields of study, and it requirements events independent, which requires the previous events have no influence on future events, and the conditional mean is equal to the conditional variance. This assumption in practice often can not be met, and the negative binomial regression is an extension of Poisson regression in this case.However, in reality, the observed data often appear change -- variance larger than its mean value, then this data is called over-dispersion count data. There are many reasons lead to count data’s over-dispersion, may be zero inflation, count data contains too many zero observations, the proportion is far more than Poisson regression or negative binomial regression prediction ability, it will show the phenomenon of zero inflation. Failure to account for the excess zeros may cause biased parameter estimates and misleading inferences. In addition to zero inflation, there may be missing data or the general sources of "heterogeneity" or the comprehensive effect of above factors etc.. Establish different models according to different reasons, the analysis of the data becomes more complex, if there is not a reasonable explanation, may lead to a deviation of the statistical inference.In this paper, a finite mixture model of hurdle Poisson distribution with missing data is proposed and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR) or missing at random (MAR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation or deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology.
Keywords/Search Tags:zero inflation, finite mixture model, missing data, stochastic EM algorithm, model selection
PDF Full Text Request
Related items