Algorithm Research On Escaping Saddle Points In High-dimensional Non-convex Optimization

Posted on:2020-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Tao

Full Text:PDF

GTID:2370330599959129

Subject:Operational Research and Cybernetics

Abstract/Summary:

PDF Full Text Request

The rapid development of artificial intelligence is inseparable from the theory of mathematical optimization.The machine learning problem can be attributed to the optimization problem of fitting the data set,which has high dimensional and non-convex characteristics.In the case of high dimensionality,it is difficult to calculate second-order Hessian information and higher-order information.In the non-convex case,the saddle points are surrounded by high error plateaus that can dramatically slow down learning,and give the illusory impression of the existence of a local minimum,and the ratio of the number of saddle points to local minima increases exponentially with the dimensionality.Therefore,how to quickly identify and escape the saddle point without increasing the amount of calculation becomes a hot topic in the field of current high-dimensional non-convex optimization.Stochastic gradient descent(SGD)is a basic algorithm in the field of machine learning optimization,which improves the iteration speed in training.The algorithms in this paper are all variants of the SGD algorithm.Perturbated Gradient Descent(PGD)algorithm adds perturbations to the gradient of the iteration points in the stuck region,thereby freeing the iteration from the saddle point.Combining the advantages of SGD and PGD,we propose a perturbed stochastic gradient descent(PSGD)algorithm with the advantages of fast iteration and avoid saddle points by disturbance.The cubic regularization(CR)algorithm can solve the problem that Hessian degenerates into a singular matrix and converges to the saddle point in Newton's method,but this algorithm needs to calculate second-order Hessian information.In this paper,we combine CR with SGD to construct a stochastic cubic regularization(SCR)algorithm.This algorithm does not need to calculate the second-order Hessian information explicitly,and only needs to calculate the corresponding Hessian-vector products.Using the open source dataset a9 a to construct a non-convex logistic regression model for numerical experiments,after analyzing the experimental results,it is known that the SGD algorithm cannot be continued to fall near the saddle point.Because of the gradient perturbation,the PSGD algorithm can avoid the saddle point and continue to fall.Compared to the first two algorithms,the SCR algorithm requires fewer iterations and has a faster convergence speed,because the SCR algorithm uses second-order Hessian information(obtained by Hessian-vector products),so the saddle point can be identified and escaped earlier.

Keywords/Search Tags:

Non-convex optimization, Saddle points, SGD, Cubic regularization

PDF Full Text Request

Related items

1	Adaptive Cubic Regularization Methods For Nonlinear Constrained Optimization
2	E-Convex Sets, E-Convex Functions And E-Convex Programming In Linear Spaces
3	Cubic Regularization Algorithm For The Second Order Minimizer Of Sparse Optimization
4	Study On Gradient Method Based On Cubic Regularization Techniques
5	The Optimization Problem Of Linear Constraints And The Derivative-free Cubic Regular Method Of Nonlinear Equations
6	Numerical Algorithms For Two Polynomial Optimization Problems
7	Research On The Distributed Optimization Algorithm For The Convex Optimization With Globally Coupled Constraints
8	First Order Algorithms For Saddle Point Problems And Constrained Optimization
9	Fixed Point Theorems In P-spaces And Applications
10	Methods for large-scale convex optimization problems with l1 regularization