A Stochastic Variance Reduction Gradient Method With Adaptive Learning Rate

Posted on:2021-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:G M Chen

Full Text:PDF

GTID:2480306557498124

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Machine learning and its derivative intelligent systems have been widely used in medical diagnosis,transportation and other fields.At the same time,the research on machine learning models and methods has also received great attention.Therefore,the research on machine learning and related topics has important significance theoretically and practically.Stochastic gradient descent(SGD)method is a popular method for solving empirical risk minimization(ERM)problems in machine learning.Unlike the gradient descent method,each iteration of SGD is very cheap,involving only the computation of the gradient corresponding to one random sample.However,due to the variance introduced by random sampling,SGD only has sublinear convergence rate when solving strongly convex problems,while the gradient descent method has linear convergence rate.The slower sublinear convergence rate makes SGD questionable in the optimization field.Many methods focus on reducing the variance and improving the convergence rate of SGD.As an improved method of SGD,stochastic variance reduction gradient(SVRG)method can effectively reduce the variance,which is caused by random sampling.SVRG has attracted extensive attention because of its linear convergence rate in sloving strongly convex optimization problems and low storage demand.SVRG chooses a fixed constant as the learning rate,which does not take into account the difference between different dimensions of the gradient vector.In this paper,we propose a stochastic variance reduction gradient method with adaptive learning rate(Ada SVRG).This method uses historical gradient information to adaptively adjust the learning rate of SVRG,so that each dimension of iterative updating has its own dynamic learning rate.In addition,in order to reduce the impact of initialization settings on method performance,initialization deviation correction was carried out.The convergence analysis shows that the Ada SVRG method has linear convergence rate when solving the strongly convex optimization problem.Inspired by the momentum acceleration trick of accelerating first-order optimization methods,it has become a research hotspot to combine momentum terms into stochastic optimization methods.However,these stochastic optimization methods,which are based on momentum acceleration techniques,usually choose a fixed constant as the learning rate.In this paper,the snapshot momentum term and the grawing inner loop size strategy in fast stochastic variance reduction gradient(FSVRG)method are combined into the Ada SVRG,which is called fast stochastic variance reduction gradient method with adaptive learning rate(Ada FSVRG).The convergence analysis shows that the Ada FSVRG has a linear convergence rate when the objective function satisfies the strong convexity hypothesis.Numerical experiments on Adult,Default,Statlog,and Nursery data sets show that for the ERM problem,when the same optimization gap are achieved,the numbers of iterations required by Ada SVRG and Ada FSVRG are less than that for SVRG and FSVRG,respectively.

Keywords/Search Tags:

Stochastic gradient method, variance reduction, adaptive learning rate, initialization bias correction, momentum acceleration

PDF Full Text Request

Related items

1	The Study Of First Order Stochastic Optimization For Finite Sum Function
2	Research On Modeling Large Scale Datasets Based On Topic Models
3	Bias and Variance Reduction in Assessing Solution Quality for Stochastic Programs
4	The Optimization Algorithm Research Of Stochastic Gradient Descent Based On Convolutional Neural Network
5	Accelerated Application Of Variance Reduction Methods In Asian Options Based On Stochastic Interest Rate Model
6	Research On The Pricing Of Variance Swap Under Stochastic Volatility
7	Research On Application Of Several Variance Reduction Techniques
8	Research On Self-gradient Correction Method Of Aviation Gravity
9	Research On Wave-front Correction Technique Based On The Stochastic Parallel Gradient Descent Algorithm
10	Application Of Stochastic-Parallel-Gradient-Descent Adaptive Optics Techniques In Beam Cleanup