Font Size: a A A

Research On Some Problems Of Variable Selection In Regression Model

Posted on:2018-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:L SuFull Text:PDF
GTID:2359330518966659Subject:Statistics, statistics
Abstract/Summary:PDF Full Text Request
In the multivariate linear regression model,variable selection is very important.the general number of constraint variables from the accuracy and interpretability of the model in two aspects.The number of independent variables can reflect more of the response variable information,so as to achieve higher prediction accuracy,but too many variables will cause the model could explain the application of reduced,greatly reduced the value of the independent variable;too little,not enough to reflect the response variable information,thus the prediction accuracy decreased significantly.Research on variable selection problem,mostly based on ordinary least squares method,additional constraints on the parameters to be estimated,which is to increase the penalty function into the penalized least squares method.Due to the compression effect of constraint conditions,will make part of the parameters to be estimated to 0,in order to achieve the purpose of variable selection.The commonly used classical this kind of method in algorithm LASSO algorithm,LASSO algorithm,SCAD algorithm and adaptive elastic net algorithm.In this paper,considering the influence of random factors,a new penalty function and penalized least squares estimation method are established:First of all,introduced the development process,the variable selection method by adding penalty function to achieve the basic idea of variable selection;a detailed analysis of LASSO algorithm,adaptive establishment process and the advantages and disadvantages of SCAD algorithm and elastic net algorithm LASSO algorithm,penalty function: due to the characteristics of LASSO algorithm,resulting in the number of variables selected in the when the partial variable selection,and the existence of multi collinearity LASSO algorithm when the effect is very poor,so the adaptive LASSO algorithm is improved on the basis of LASSO,the estimated coefficient is more sparse,choose less independent variables;SCAD algorithm effect is more obvious,not only can choose fewer independent variables,and the estimator satisfy the sparsity,unbiasedness,continuity and Oracle and a series of excellent properties;elastic net method is LASSO with the classical ridge regression method Combined with the new variable selection method,the main advantage of this method is to deal with the situation when the group effect occurs in the independent variableSecondly,considering the Gamma distribution and Weibull distribution are two important kinds of life not tired,has a wide application,so the random influence factors of parameters are assumed by Gamma distribution and Weibull distribution,a new estimation method of penalty function and penalized least squares.Through the hierarchical maximum likelihood estimation method to construct the new penalty function in the discussion,the penalty function properties,gives a method of parameter estimation and prove new penalized least squares amount satisfies the Oracle properties.Finally,through the case analysis of a new variable selection method was evaluated.The mean square error and mean absolute error as the evaluation index,select a classic case in the previous literature is analyzed using the indexes were calculated,and LASSO algorithm,LASSO algorithm,SCAD algorithm and adaptive elastic net algorithm calculation results by contrast,we found that the advantages of the new algorithm to deal with sparse situation obviously,is better than other algorithms,and for non-sparse,effect and adaptability of the LASSO algorithm has no significant difference.
Keywords/Search Tags:Variable Selection, Gamma Distribution, Weibull Distribution, LASSO, Adaptive LASSO
PDF Full Text Request
Related items