Font Size: a A A

Research On Acceleration Techniques In The First-Order Stochastic Algorithms

Posted on:2023-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L HeFull Text:PDF
GTID:1520306917480044Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
First-order optimization algorithms based on gradient information have a significant competitive advantage over second-order optimization algorithms,and are widely used in machine learning,applied statistics,and engineering.However,traditional deterministic firstorder optimization algorithms are not suitable for solving large-scale machine learning tasks.Practice shows that many practical engineering application problems do not require high precision solutions.Therefore,the first-order stochastic algorithm has obvious advantages when the solution accuracy of the machine learning task is low,the data sample size is large,and the sample space dimension is high.However,due to the influence of large variance introduced by random sampling during the iterative process of the algorithm,which leads to the slow convergence of the first-order stochastic algorithms.In order to improve the convergence rate of existing first-order stochastic algorithms,variance reduction techniques have been developed.Theoretically,it has been shown that the first-order stochastic algorithms based on variance reduction techniques have a faster linear convergence rate.On the other hand,momentum-based acceleration techniques have been developed and applied in the field of first-order stochastic algorithms,both in theory and in practice,to promote the use and application of first-order stochastic algorithms in solving large-scale machine learning tasks.However,there are still some shortcomings:First,while some algorithms’ theoretical proofs and algorithmic frameworks are too complicated for theoretical understanding and practical application.Second,for some non-convex optimization models,the theoretical proofs of first-order stochastic algorithms based on acceleration techniques are not yet complete and need to be addressed to fill this gap.Therefore,it is of great theoretical and practical importance to develop efficient,stable,and simple first-order accelerated stochastic algorithms with optimal convergence rates,especially for non-convex optimization models in the field of machine learning.The main work and innovations of this paper are as follows:1)A class of convex empirical risk minimization models is studied.We introduce the Katyusha momentum technique to improve the classical stochastic variance reduction gradient algorithm framework and propose a negative momentum-based stochastic variance reduction gradient algorithm.Specifically,we perform an additional Katyusha momentum step for each iteration of the inner loop of the stochastic variance reduction gradient algorithm to speed up the convergence of the algorithm;Second,we replace the original stochastic gradient descent step equivalently with a similar proximal iteration step when the parameters are updated,which helps to reduce the distance between iteration points and improve the stability of the algorithm.The theory shows that the proposed algorithm has an optimal convergence rate of O((?)).Finally,numerical experiments verify the effectiveness and superiority of the proposed algorithm.2)A class of nonconvex plus convex nonsmooth regularized stochastic optimization models is studied.Aiming at a class of composite model with nonconvex smooth loss function and convex nonsmooth regularization term,the classical Momentum stochastic gradient technique is introduced to improve the original probabilistic gradient estimation algorithm used to solve non compound problems by proposing a probability-based accelerated gradient estimation algorithm.The gradient update step in the probabilistic gradient estimation algorithm is incorporated with the Momentum idea to accelerate the convergence of the algorithm.The theory shows that the proposed algorithm has an optimal convergence rate of(?)Finally,numerical experiments verify the effectiveness and superiority of the proposed algorithm.3)A class of convex-smooth plus non-convex and non-smooth regularized empirical risk minimization models is studied.The loss function in this composite model is convex and smooth,and the regularization term is nonconvex and nonsmooth.We introduce the classical Nesterov inertial acceleration technique to improve the classical proximal stochastic variance reduction DC algorithm,and propose a proximal stochastic variance reduction DC algorithm based on inertial acceleration.The inertial acceleration step is inserted into the inner loop iteration step of the proximal stochastic variance reduction DC algorithm,and the parameter update is performed using the stochastic gradient at the inertia point.We give an asymptotic convergence analysis,where the limit point of the iterative point sequence generated by the theoretical surface algorithm converges to a stable point of the objective function when appropriate inertia coefficients and step sizes are chosen.Finally,the numerical experiments verify the superiority and reliability of the proposed algorithm.4)Two types of fully nonconvex regularized optimization models are studied:regularized empirical risk minimization model and regularized expectation risk minimization model.First,for the regularized empirical risk minimization model,an inertial acceleration algorithm is proposed to improve the classical Nesterov inertial acceleration technique for solving the general composite model with a proximal stochastic variance reduction gradient.Based on the supermartingale convergence theorem,the theory gives an analysis of the asymptotic convergence of the proposed algorithm almost everywhere with respect to the sequence of iteration points.Second,for the regularized expectation risk minimization model,using the previously proposed probabilistic accelerated gradient estimation algorithm to solve it,and the theory shows that the proposed algorithm has an optimal convergence rate of(?)Finally,the numerical experiments verify the superiority and reliability of the proposed algorithm.
Keywords/Search Tags:Nesterov momentum, Kaytusha momentum, Variance reduction, Machine learning, Stochastic algorithm, Stochastic gradient descent, Proximal operator, Nonconvex optimization
PDF Full Text Request
Related items