A Distributed Algorithm For Lasso Variable Selection

Posted on:2022-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:W J Zeng

Full Text:PDF

GTID:2480306479493124

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Regularization technique is a common method to deal with variable selection in machine learning,which is used for regression problems with sparsity.Distributed computing is an important way to reduce computing time and improve efficiency when large sample size or massive data are stored on different machines.In this paper,new distributed algorithms for variable selection of LASSO and ARLASSO models are constructed respectively.Algorithms are suitable for sample data stored by distributed cluster storage processor controlled by central processor.First,the regularization method of variable selection and distributed optimization algorithm are summarized,and some bases involved in this paper are briefly introduced.Then,a new distributed algorithm for Lasso variable selection is studied.Based on the equivalent optimization model and the idea of alternating step iteration,a distributed algorithm for Lasso variable selection is constructed and the convergence of the algorithm is also proved.As a demonstration of the applicability of our algorithms,numerical experiments for the sparse linear regression problem with large sample sets are performed.The experiments show that the proposed distributed algorithm has better advantages in computational time and accuracy compared with cyclic coordinate descent and ADMM algorithm.Further,the algorithm idea of LASSO distributed variable selection is extended to AR-LASSO variable selection model.Based on the AR-Lasso model variable selection constructed by Fan and Emre(2012),we construct a distributed algorithm and prove the convergence of the distributed algorithm.Finally,the distributed LASSO model and distributed AR-LASSO model are compared by numerical experiments.Experimental results show that,when the sample error satisfies the normal distribution,the distributed AR-Lasso model is far better than the distributed Lasso model in the discrimination of non-zero parameters at n《p,that is,using the distributed AR-Lasso model can pick up the influence variables mistakenly lost by the Lasso model.In the case that the sample error does not have normal distribution,the accuracy of the distributed ARLasso model in the discrimination of zero parameters and non-zero parameters is far better than that of the distributed Lasso model.

Keywords/Search Tags:

Variable selection, Lasso, AR-Lasso, Distributed algorithm

PDF Full Text Request

Related items

1	Research On The Advantages And Disadvantages Of Lasso And Its Improved Methods In Variable Selection
2	Comparison And Analysis Of Variable Selection Methods In Classical Statistics And Machine Learning
3	Bi-level Variable Selection Methods Based On Lasso
4	A Distributed Algorithm For Lasso Variable Selection
5	The Lasso And Its Methods Of Model Selection In Generalized Linear Models
6	Comparison Of Several Methods For Generating Directed Acyclic Graph By Variable Selection
7	Research And Application Of Random LASSO Post-selective Inference Algorithm
8	Summary Of Lasso Variable Selection Methods
9	Application Of Lasso And Improved Lasso Method In Several Kinds Of Model Variables Selection
10	Bavesian Adaptive Square-root Lasso