Font Size: a A A

A Stochastic Variance Reduction Gradient Method With Adaptive Learning Rate

Posted on:2021-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:G M ChenFull Text:PDF
GTID:2480306557498124Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Machine learning and its derivative intelligent systems have been widely used in medical diagnosis,transportation and other fields.At the same time,the research on machine learning models and methods has also received great attention.Therefore,the research on machine learning and related topics has important significance theoretically and practically.Stochastic gradient descent(SGD)method is a popular method for solving empirical risk minimization(ERM)problems in machine learning.Unlike the gradient descent method,each iteration of SGD is very cheap,involving only the computation of the gradient corresponding to one random sample.However,due to the variance introduced by random sampling,SGD only has sublinear convergence rate when solving strongly convex problems,while the gradient descent method has linear convergence rate.The slower sublinear convergence rate makes SGD questionable in the optimization field.Many methods focus on reducing the variance and improving the convergence rate of SGD.As an improved method of SGD,stochastic variance reduction gradient(SVRG)method can effectively reduce the variance,which is caused by random sampling.SVRG has attracted extensive attention because of its linear convergence rate in sloving strongly convex optimization problems and low storage demand.SVRG chooses a fixed constant as the learning rate,which does not take into account the difference between different dimensions of the gradient vector.In this paper,we propose a stochastic variance reduction gradient method with adaptive learning rate(Ada SVRG).This method uses historical gradient information to adaptively adjust the learning rate of SVRG,so that each dimension of iterative updating has its own dynamic learning rate.In addition,in order to reduce the impact of initialization settings on method performance,initialization deviation correction was carried out.The convergence analysis shows that the Ada SVRG method has linear convergence rate when solving the strongly convex optimization problem.Inspired by the momentum acceleration trick of accelerating first-order optimization methods,it has become a research hotspot to combine momentum terms into stochastic optimization methods.However,these stochastic optimization methods,which are based on momentum acceleration techniques,usually choose a fixed constant as the learning rate.In this paper,the snapshot momentum term and the grawing inner loop size strategy in fast stochastic variance reduction gradient(FSVRG)method are combined into the Ada SVRG,which is called fast stochastic variance reduction gradient method with adaptive learning rate(Ada FSVRG).The convergence analysis shows that the Ada FSVRG has a linear convergence rate when the objective function satisfies the strong convexity hypothesis.Numerical experiments on Adult,Default,Statlog,and Nursery data sets show that for the ERM problem,when the same optimization gap are achieved,the numbers of iterations required by Ada SVRG and Ada FSVRG are less than that for SVRG and FSVRG,respectively.
Keywords/Search Tags:Stochastic gradient method, variance reduction, adaptive learning rate, initialization bias correction, momentum acceleration
PDF Full Text Request
Related items