Font Size: a A A

Study On The Variable Selection Problems In Dispersion Modeling

Posted on:2010-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D R WangFull Text:PDF
GTID:1100360275951153Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Variable selection is fundamental to statistical modeling.A large number of researchers have been devoting into the variable selection problems.With the development of modern technology.more and more complicated data and models have emerged.Hierarchical regression models which can analyze data better are the important part of them.However.many references are concerned with the variable selection of the mean regression model.and there are few methods proposed for the mean and dispersion joint modeling.According to our research,we find that the methods of variable selection which are adequate for mean models may fail to be directly extended to the hierarchical regression models.Thus,it is necessary to study the variable selection problems for complicated models.This dissertation is concerned with the study on variable selection problems of mean and dispersion joint modeling.Purthermore,the idea of variable selection is applied to the data diagnosis field.Our research results include the following three conclusions.Fot the heteroscedastic regression models,the simultaneous variable selection for mean model and variance model is discussed.When the number of mean parameters is a large fraction of the sample size,the MLEs of variance parameters can be seriously biased.And the model risk would be increased based on such estimators.And we propose a criterion named PICa based on the adjusted profile log-likelihood function which has been used to reduce the bias of the variance component estimators.Our method is different from the conventional ones in that it combines the information of mean model and the inrormation of variance model. and PICa put suitable weights on mean and variance variable penalty.Thus it can simultaneously select the variables for mean and variance models.Under regular conditions.we prove that PICa has the following asymptotic properties:for the mean model,PICa is consistent for model selection;and for the variance model, the probability of underfltting is zero.Monto Carlo simulations show that PICa performs better than conventional methods in many usual situations.For the double generalized linear models,on the one hand,we propose a variable selection criterion based on the extended quasi-likelihood.The new criterion is an extension of Akaike's information criterion.And its performance is investigated through simulation studies and a real data application.On the other hand, the variable selection problems for high dimensional generalized linear models with dispersion modeling are studied.When there are many variables and data is not enough,subset selection methods may not distinguish the large numbers of candidate models,and it's hard to put into practice for the heavy computations.We propose a class of non-concave penalized extended quasi-likelihood method,prove the Oracle property of the resulting estimates and put forward a new arithmetic for the new procedure.At the same time,considering that the property of estimates depends on the penalty function,we improve the choice of tuning parameters in the penalty function from the angle of consistency for model selection.As a part of modeling strategy,variable selection is an important tool to reflect the essence of data fitting.Thus,it can also be applied to other fields of statistical modeling.We focus on the mask effects between diagnosis of outliers and of response transformation in regression analysis.Based on the idea of variable selection,a simultaneous diagnosis method is proposed by constructing covariates and employing the generalized information criterion.The efficiency of the proposed approach is compared with naive methods throuch a Monte Carlo simulation and two examples.
Keywords/Search Tags:variable selection, double generalized linear models, heteroscedastic regression models, profile likelihood function, extended quasi-likelihood function, AIC, BIC, penalized function, SCAD, outlier
PDF Full Text Request
Related items