Font Size: a A A

Algorithm Research On Escaping Saddle Points In High-dimensional Non-convex Optimization

Posted on:2020-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y TaoFull Text:PDF
GTID:2370330599959129Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The rapid development of artificial intelligence is inseparable from the theory of mathematical optimization.The machine learning problem can be attributed to the optimization problem of fitting the data set,which has high dimensional and non-convex characteristics.In the case of high dimensionality,it is difficult to calculate second-order Hessian information and higher-order information.In the non-convex case,the saddle points are surrounded by high error plateaus that can dramatically slow down learning,and give the illusory impression of the existence of a local minimum,and the ratio of the number of saddle points to local minima increases exponentially with the dimensionality.Therefore,how to quickly identify and escape the saddle point without increasing the amount of calculation becomes a hot topic in the field of current high-dimensional non-convex optimization.Stochastic gradient descent(SGD)is a basic algorithm in the field of machine learning optimization,which improves the iteration speed in training.The algorithms in this paper are all variants of the SGD algorithm.Perturbated Gradient Descent(PGD)algorithm adds perturbations to the gradient of the iteration points in the stuck region,thereby freeing the iteration from the saddle point.Combining the advantages of SGD and PGD,we propose a perturbed stochastic gradient descent(PSGD)algorithm with the advantages of fast iteration and avoid saddle points by disturbance.The cubic regularization(CR)algorithm can solve the problem that Hessian degenerates into a singular matrix and converges to the saddle point in Newton's method,but this algorithm needs to calculate second-order Hessian information.In this paper,we combine CR with SGD to construct a stochastic cubic regularization(SCR)algorithm.This algorithm does not need to calculate the second-order Hessian information explicitly,and only needs to calculate the corresponding Hessian-vector products.Using the open source dataset a9 a to construct a non-convex logistic regression model for numerical experiments,after analyzing the experimental results,it is known that the SGD algorithm cannot be continued to fall near the saddle point.Because of the gradient perturbation,the PSGD algorithm can avoid the saddle point and continue to fall.Compared to the first two algorithms,the SCR algorithm requires fewer iterations and has a faster convergence speed,because the SCR algorithm uses second-order Hessian information(obtained by Hessian-vector products),so the saddle point can be identified and escaped earlier.
Keywords/Search Tags:Non-convex optimization, Saddle points, SGD, Cubic regularization
PDF Full Text Request
Related items