| Stochastic gradient descent and its noisy variant are widely used in machine learning.By using the principle of gradient descent,this kind of iterative optimization algorithm makes the loss function smaller in the direction of the fastest descent speed.The stochastic approximation method proposed by Robbins and Monro in 1951 has an enlightening effect on the subsequent development of stochastic gradient algorithm.In the modern big data environment,SGD is an important method for training neural networks,processing largescale data sets,optimization,etc.Deeply welcomed in various fields.Algorithm-related overall performance is the core of statistical learning theory.Inspired by Chen et al.,the sum of generalization error and optimization error can be used to measure the overall performance of the algorithm.In fact,generalization error and optimization error measure overfitting and underfitting.In this paper,we discuss the stability and convergence of deep learning iterative optimization algorithms such as SGD.The stability of the algorithm means that the output of the algorithm is not too dependent on any single training example.The stability of the algorithm is closely related to its generalization ability.It is of theoretical and practical significance to estimate the generalization error quantitatively.For the stochastic gradient method,it is of great theoretical significance to discuss the influence of SGD on generalization error in the case of non-convex learning objectives,and it is crucial to the generalization error of deep learning.Different from the existing studies,we are concerned with the generalization error bounds of the weighted output of the iterative algorithm and the size of the optimization error.We discuss that the loss function is subgaussian.Based on the ideas of standard setting,random subset and duplicate sample,we infer the generalization bound of the iterative algorithm by means of the tools of unconditional mutual information,separated mutual information and conditional mutual information.Finally,these bounds are applied to real algorithms.To verify the results of the analysis,we conducted experiments on the dataset of MNIST and CIFAR-10.Numerical simulation results show that the weighted output is always better in non-convex optimization problems. |