Font Size: a A A

Variable Selection And Application For Linear Regression Models In Complex Data Setting

Posted on:2022-06-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:S W XiaFull Text:PDF
GTID:1480306536961239Subject:Statistics
Abstract/Summary:PDF Full Text Request
Linear regression model is one of the most famous model in statistical regression problem and have a wide application in finance,biology,medicine and other fields.By constructing models to estimate coefficients,we can explain the relationship between covariates and response variables,and also make statistical inference and prediction for the real problems.Meanwhile,there are many regression problems where the response variables are multivariate in the real data setting.As the extension of classical linear model,multivariate linear regression model is studied and applied extensively.In this paper,we mainly study the variable selection and its application in unitary and multivariate linear regression model.Specifically,the research content includes the following four parts.Chapter 2 studies the problem of simultaneous nonnegative estimation and variable selection in sparse high-dimensional linear models.Based on SCAD penalty function,we propose the nonnegative SCAD estimator under the nonnegative constraint.We adopt the Multiplicative Updates algorithm to get the solution,making a fast computation.In the theoretical part,we show that under weak conditions,the proposed method is consistent for variable selection and has the asymptotic normality property.Finally,numerical simulations and a real data analysis are carried out to verify the efficiency of the proposed method.In Chapter 3,we consider a high-dimensional linear regression problem where there are complex correlation structures among predictors.We propose a graph-constrained regularization procedure,named Sparse Laplacian Shrinkage with the Graphical Lasso Estimator(SLS-GLE).The procedure uses the estimated precision matrix to describe the specific information on the conditional dependence pattern among predictors,and encourages both sparsity on the regression model and the graphical model.We introduce the Laplacian quadratic penalty adopting the graph information,and give detailed discussions on the advantages of using the precision matrix to construct the Laplacian matrix.Theoretical properties and numerical comparisons are presented to show that the proposed method improves both model interpretability and accuracy of estimation.We also apply this method to a financial problem and prove that the proposed procedure is successful in assets selection.In Chapter 4,we study the variable selection in multivariate regression model in high-dimensional data setting with complex structure.In practice,the correlation structures within are complex and interact on each other based on the regression function.The proposed method,Interaction Pursuit Biconvex Optimization(IPBO),explores the regression relationship allowing the predictors and responses derived from different multivariate normal distributions with general covariance matrices.The proposed method solves the complex structure of data by building a structured sparsity penalty to encourages the shared structure between the network and the regression coefficients.We prove theoretical results under interpretable conditions,and provide an efficient algorithm to compute the estimator.Simulation studies and real data examples compare the proposed method with several existing methods,indicating that IPBO works well.In Chapter 5,we present a new approach for multivariate regression problems in complex data setting.The proposed method,named Integrated Precision Matrix Estimation(IPME),is formulated as a biconvex optimization problem and we solve via efficient two-step minimization.In this method,the responses and predictors no longer be seen as two separate parts but considered as a whole in the Gaussian graphical model,while the edges in the graph structure correspond to the nonzero coefficients within the multivariate regression models.We prove statistical theoretical results for IPME.Numerical comparisons of the proposed method with several existing methods show that the method works effectively both in model selection and estimation.We apply this method to financial data obtaining adequate allocations and showing that IPME is successful in asset allocation selection.
Keywords/Search Tags:Linear regression model, Multivariate regression model, Variable selection, High-dimensional data, Nonnegative estimation
PDF Full Text Request
Related items