Font Size: a A A

Convergence Analysis Of Several Stochastic Gradient Descent Methods With Biased Stochastic Gradients

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LuoFull Text:PDF
GTID:2480306491459964Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Stochastic gradient methods are a simple and efficient method for solving large-scale optimization problems and have been widely applied in machine learning and deep learning.However,in contrast to the proliferation of new algorithms,the theoretical developments on stochastic gradient methods have struggled.The current analysis of the convergence of stochastic gradient methods is mostly based on the assumption that the stochastic gradient is an unbiased estimate of the gradient of the objective function,although this condition is always found to be not easily satisfied in practice.On the other hand,the stepsizes chosen in most theoretical analyses is decreasing stepsizes satisfying the condition proposed by Robbin and Monro,which is inconsistent with the common approach in practice.The deviation of the assumed conditions from the actual situation may lead to failure of practical guidance of the theoretical results.It has been shown that in solving strongly convex optimization problems,the convergence rate of stochastic gradient descent(SGD)under unbiased gradient estimation can be effectively improved by ?-suffix averaging procedure.However,previous studies have argued that the algorithm SGD-? obtained by applying the procedure cannot be computed on-the-fly.We generalize ?-suffix averaging to a more general form,rounding ?-suffix averaging,to compute on-the-fly and obtain the SGD-r? algorithm by applying it to SGD.Meanwhile,the convergence analysis of SGD,SGD-? and SGD-r? is presented in this paper under the assumption that the gradient estimation is biased,considering different stepsize schemes,respectively.At last but not least,numerical experiments are performed in this paper in real world,and the experimental results verify the effectiveness of the algorithm and the theoretical analysis.
Keywords/Search Tags:Stochastic gradient method, Biased gradient estimation, Averaging procedure, Convergence, Deep learning
PDF Full Text Request
Related items