Font Size: a A A

A Distributed Algorithm For Lasso Variable Selection

Posted on:2022-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZengFull Text:PDF
GTID:2480306479493124Subject:Statistics
Abstract/Summary:PDF Full Text Request
Regularization technique is a common method to deal with variable selection in machine learning,which is used for regression problems with sparsity.Distributed computing is an important way to reduce computing time and improve efficiency when large sample size or massive data are stored on different machines.In this paper,new distributed algorithms for variable selection of LASSO and ARLASSO models are constructed respectively.Algorithms are suitable for sample data stored by distributed cluster storage processor controlled by central processor.First,the regularization method of variable selection and distributed optimization algorithm are summarized,and some bases involved in this paper are briefly introduced.Then,a new distributed algorithm for Lasso variable selection is studied.Based on the equivalent optimization model and the idea of alternating step iteration,a distributed algorithm for Lasso variable selection is constructed and the convergence of the algorithm is also proved.As a demonstration of the applicability of our algorithms,numerical experiments for the sparse linear regression problem with large sample sets are performed.The experiments show that the proposed distributed algorithm has better advantages in computational time and accuracy compared with cyclic coordinate descent and ADMM algorithm.Further,the algorithm idea of LASSO distributed variable selection is extended to AR-LASSO variable selection model.Based on the AR-Lasso model variable selection constructed by Fan and Emre(2012),we construct a distributed algorithm and prove the convergence of the distributed algorithm.Finally,the distributed LASSO model and distributed AR-LASSO model are compared by numerical experiments.Experimental results show that,when the sample error satisfies the normal distribution,the distributed AR-Lasso model is far better than the distributed Lasso model in the discrimination of non-zero parameters at n《p,that is,using the distributed AR-Lasso model can pick up the influence variables mistakenly lost by the Lasso model.In the case that the sample error does not have normal distribution,the accuracy of the distributed ARLasso model in the discrimination of zero parameters and non-zero parameters is far better than that of the distributed Lasso model.
Keywords/Search Tags:Variable selection, Lasso, AR-Lasso, Distributed algorithm
PDF Full Text Request
Related items