Font Size: a A A

A New AICc-based Information Criterion-bAICc

Posted on:2018-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:G F SonFull Text:PDF
GTID:1310330542950126Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Model building is a very important issue in data analysis.When we build a model,we should detect which covariate variables have effect on the response variable,and decide which covariate variables should be included in the model.To this end,the covariate variables selection is the fundamental work of data analysis.There are many statistical methods for choosing the covariate variable for a "best" model.By saying the "best" model,we consider the model with the goodness of fit,or the ability of model prediction.There are also many methods to choose in variable selection,and the information criteria method is a class of method which is commonly used.The AIC is a widely used criteria method,and it can be considered as the first model selection criterion to gain the widespread acceptance.Each model has a AIC value.The smaller the AIC value is,the better the corresponding model is.It is well known that the traditional AIC method has the problem of over-specified,the AIC method may select a model including some false variables.Moreover,in small samples,it may select the full model as the "best" model.Many researchers had revised this criterion in different model frameworks.And,AICc is an effective and wildly used correction of AIC.The main advantage of AICc is that it performs better than AIC for small samples in the covariate variable selection.However,AICc may loses merits as the sample size increases.Inspired by AICc's merits for small samples,we proposed a new criterion in linear and generalized linear models.The new criterion is based on AICc,and we call it blockwise AICc?bAICc?in this paper.The performance of bAICc is studied when the sample size of blocks increase from small to large.Further more,since the consistency is an attractive character for a model selection criterion,if an information criterion is consistent and when the sample size tends to infinity,this information criterion can select the true model with probability one.Under some assumptions in linear regression model framework,we also proved that AICc is consistent in under-specified model set.We also proved the probability is larger than 1/2 for the true model's AICc value being less than any over-specified model's AICc value.Moreover,we proved bAICc is a consistent model selection criterion in linear regression model framework.Denote M0 as the true model.Consider a candidate model Mk,where k = 1,...,K.Let the candidate model set be A = ?Mk?k = 1,...,K}?the over-specified model set be Ai = {Mk? A?M0? Mk? and the under-specified model set be A2 ={Mk? A?M0???Mk or Mk = M0},Then,we have A1?A2 = A.For Theorem 1 in linear regression framework,define ?E as the regression coefficients vector of the true model M0,and ?is the regression coefficients vector of the candidate model Mk.The dimension of ?E and ? are p x 1,while ?E has p0 non-zero elements and ? has p non-zero elements.This means the dimensions of the true model are p0 and p?p0,P?p?.l??E;y?and l??;y?are the log-likelihood functions of the true model and the candidate model,respectively.Assume:?C1?l??E;y?-l??;y?converges to a positive constant or positive infinite;?C2??E is the consistent estimator of ?E;?C3??is not the consistent estimator of ?E.where ? denotes maximum likelihood estimate.With the above three assumptions,we have Theorem 1.Theorem 1 When we select the best model from the under-specified model set A2,the AICc is a consistent information criterion,i.e.,AICc?M0?<AICc?Mk?is always correct.Here Mk ? A2 and Mk ? M0.And AICc?M0?and AICc?Mk?are the AICc values of the true model and the candidate mode.For over-specified model set A1,we have Theorem 2.Theorem 2 For a model in the over-specified model set A1,AICc is not a consistent information criterion.AICc may choose an over-specified model.However,the probability is larger than 1/2 for the true model M0's AICc value being less than any over-specified model Mk'S AICc value,i.e.P{AICc?M0?<AICc?Mk?}>1/2,where Mk ? A1 and Mk? M0.From the Theorems above,AICc can eliminates the under-specified models but it can not eliminates over-specified models,AICc is not a consistent information criterion since it probably choose an over-specified model.Without loss of generality,assume that A only includes one over-specified model,according to Theorem 1 and Theorem 2,we have Theorem 3 about the consistent of bAICc.Theorem 3 When the number of the small blocks B?+?,bAICc is a consistency information criterion,with P?bAICc?M0?>bAICc?Mk???1,where Mk ? A and Mk ? M0,bAICc?M0?and bAICc?Mk?are the bAICc values of the true model and the candidate model.Simulations of linear regression,binomial regression,poisson regression and gam-ma regression using the newly proposed bAICc as well as the other six commonly used criteria demonstrates the bAICc has a good performance under some model settings.It also outperforms other un-consistent model selection information criterion for large samples.These implies actually bAICc may be consistent in generalized linear model framework.Three data sets also studied in this paper,the first one is the child birthrate study,the second one is the survival study of snails under some ecological conditions,the last is the fish infected data analysis.Simulations and real data analysis illustrate that the new information criterion performs good in the covariate variable selection.
Keywords/Search Tags:model selection, information criterion, AICc, bAICc, consistency, quasi-likelihood
PDF Full Text Request
Related items