Font Size: a A A

Distributed Logistic Regression

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:P S ShiFull Text:PDF
GTID:2517306095469404Subject:Statistics
Abstract/Summary:PDF Full Text Request
In this paper,we first study the distributed Logistic regression to process the separated large scale data which is stored in different linked computers.Based on the Alternating Direction Method of Multipliers(ADMM)algorithm,we transform the solving of Logistic problem into the multistep iteration process,and propose the distributed Logistic algorithm which has controllable communication cost.Specifically,in each iteration of the distributed algorithm,each computer updates the local estimators and interacts the local estimators with the neighbors simultaneously.Then we prove the convergence of Distributed Logistic algorithm.Due to the decentralized property of computer network,the proposed Distributed Logistic algorithm is robust.The classification results of our Distributed Logistic method are same as the non-distributed approach.Numerical studies have shown that our approach are both effective and efficient which perform well in distributed massive data analysis.Then we study the differential privacy Lasso method to process the massive data.Due to the instability of Lasso,the differential privacy framework cannot be directly applied to Lasso,Based on the functional mechanism,we study the Lasso method with functional perturbation by using ADMM algorithm.Specifically,in each iteration of the ADMM algorithm,only one step will directly accesses the dataset,then We can only add noise during to this step,so that the new algorithm meets differential privacy.When analyze a large sample and low dimensions data set,even if the privacy budget is small,the selection result of the differential privacy Lasso method are similar to the Lasso.With the data dimension increases,the privacy budget needs to increase,then the results of the differential privacy Lasso method are similar to the result of the Lasso method now.When the sample size is too small,our method cannot complete the task of variable selection.The experiments have shown that the effectiveness of the our method in processing large-scale data.
Keywords/Search Tags:Distributed, Logistic regression, ADMM algorithm, Differential privacy, Lasso
PDF Full Text Request
Related items