| In the big data environment,due to the storage,computing power and security privacy issues of a single computer,the traditional centralized optimization method may no longer be feasible,so the advantages of multiple machines can be used to store,calculate and analyze large data sets.In a distributed environment,in order to reduce the computational complexity and communication cost and improve the convergence speed of the algorithm,this paper proposes two distributed optimization algorithms based on the conjugate gradient method,which are used to solve the optimization problems of the linear regression model and the logistic regression model respectively.Specifically,it includes the following two parts:First,aiming at the optimization problem of large linear regression models,a distributed conjugate gradient algorithm is proposed in a distributed environment,the key of which lies in the distributed approximate calculation of the step size formula.Based on the conjugate gradient method,an efficient algorithm flow for communication is designed,in which two communications are performed in each iteration,and only scalars and vectors are transmitted between machines,and the communication cost is low.It is theoretically proved that when the data satisfies certain conditions,the distributed conjugate gradient algorithm has linear convergence.Simulation numerical experiments show that the distributed conjugate gradient algorithm can match the performance of the centralized algorithm after a certain number of iterations,and the algorithm has better convergence performance than the distributed alternating direction multiplier method.Finally,the feasibility and effectiveness of the distributed conjugate gradient algorithm are verified in real data experiments.Second,aiming at the optimization problem of large logistic regression models,a distributed restart conjugate gradient algorithm is proposed in a distributed environment.The main differences between this algorithm and the distributed conjugate gradient algorithm are two points: One is to add restart technology on the basis of distributed conjugate gradient method to improve the efficiency of the algorithm;the other is that the step size of each iteration can no longer be calculated by exact one-dimensional linear search,but the step size factor is calculated using Armijo criterion.In our simulated numerical experiments,we compared the error results of the distributed restart conjugate gradient algorithm with the centralized algorithm and found that the algorithm performed as well as the centralized algorithm.In addition,the relationship between the error of the algorithm and the total sample size is analyzed.In this paper,a distributed conjugate gradient algorithm and a distributed restart conjugate gradient algorithm are proposed for the linear regression model and the logistic regression model respectively to solve the distributed optimization problem of their loss function.Therefore,the research in this paper has important practical and theoretical significance.At present,there is little literature on the use of conjugate gradient methods to solve optimization problems in distributed systems.The research in this paper expands the application scope and field of the conjugate gradient method to a certain extent,and has certain practical value. |