Font Size: a A A

Variable Selection Methods Based On Penalized Likelihood Function And Their Applications In High-dimensional Model

Posted on:2018-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L ZhuFull Text:PDF
GTID:1319330518959900Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology,the amount of information we can get,alone with the dimension of variables,is increasing.The problem of how to select the best model from so many candidates becomes an important topic of the Econometrics.The good variable selection method can change shortcomings of the traditional method which include large computation and overfitting.Moreover,the selected model has good prediction accuracy and prediction ability,effectively eliminate the interference variable to obtain the simplest model.The penalized likelihood function method is a continuous optimization process,which is more stable than discrete method and could solve by the reasonable algorithm even if the number of variables is large.Therefore,for high-dimensional model,using the penalized likelihood function method to select model will be more effective,accurate and stable.In our paper,based on the penalized likelihood function method,we propose variable selection methods of several types of high-dimensional model.These methods we propose can simultaneously select model and estimate parameters;in addition,by using the theory of probability and mathematical statistics we show that the estimator obsess Oracle properties,that is,the estimator can correctly select covariates with nonzero coefficients with probability converging to one and the estimator of nonzero coefficients have the same asymptotic distribution.Specifically,we obtain the following main conclusions:Firstly,we propose the adaptive estimation method for a high-dimensional model,with the inspiration of the bridge estimation method.We apply different weights on the penalty term by the importance of variable for the adaptive bridge estimator.And then we check that whether the proposed estimator meets the standard of good estimator,that is,whether the estimator can correctly select covariates with nonzero coefficients with probability converging to one and the estimator of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance.Under appropriate conditions,we prove that the adaptive estimator enjoys the Oracle property.Numerical and empirical performances of proposed estimator are demonstrated by simulation and real data.Secondly,we mainly study the M-estimation method for the high-dimensional linear regression model,and discuss the properties of M-estimator when the penalty term is the local linear approximation.In fact,M-estimation method is a framework,which covers the methods of the least absolute deviation? the quantile regression? least squares regression and Huber regression.When the data exists abnormal values or the error term has the heavy tailed distribution,the method of least absolute deviation which is the special case of M-estimation is more robust than the least squares estimate.In theory,by combining M-estimation and local linear approximation as the objective function,we show that the proposed estimator possesses the good properties by applying certain assumptions.In the part of numerical simulation,we select the appropriate algorithm to show the good robustness of this method.Moreover,for ultra-high dimensional setting,simulation study demonstrates that forward regression combined with our proposed method performs competitive.In the empirical part,by applying the real data we also show that this proposed method can select the variables and parameter estimation as well.Finally,we study the credit default customer identification method based on highdimensional Logistic model.By selecting the Logistic model which is usually used in the credit scoring model,the important factors for credit default customer are identified.And at the same time we measure and forecast customer's credit default risk based on the proposed Logistic model.Numerical simulation results show that the the variable selection method we propose is effective.The empirical results also show that,we can select variables with high ability of explanation and prediction by using the proposed variable selection method.
Keywords/Search Tags:variable selection, the penalized likelihood function, high-dimensional data, the Oracle property
PDF Full Text Request
Related items