Font Size: a A A

The Optimization Algorithm Research Of Stochastic Gradient Descent Based On Convolutional Neural Network

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:T TanFull Text:PDF
GTID:2370330611964275Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
After years of accumulation,artificial intelligence technology has become increasingly mature,and its application fields have also been continuously expanded.Among them,deep learning technology based on neural network has become a research focus in this field due to outstanding effects.In deep learning,the performance of a convolutional neural network often depends on its model structure and learning algorithm.Under the premise of determining the structure of the convolutional neural network model,the network parameters connected between each neuron will directly determine the final performance of the model.As the most basic learning algorithm for adjusting model parameters,Stochastic Gradient Descent?SGD?has become an indispensable part of practical engineering applications for deep learning.According to the underlying operation of deep learning,SGD has serial and parallel calculation methods for model parameters of convolutional neural network.Through the analysis of SGD,it is found that SGD mainly has the following two problems:Firstly,in the serial calculation of SDG,its learning rate is fixed,and it is very difficult for SGD to choose an appropriate learning rate.When the learning rate is too small,the algorithm's convergence speed will be very slow.When the learning rate is too large,SGD will easily cause large-scale vibration of the model parameters during the iteration process,and even cause the model to not converge.Secondly,in the parallel calculation of SGD,there are two ways of running synchronously and asynchronously.Compared to synchronous parallelism,asynchronous parallelism has a faster running speed.However,there is a problem with gradient delay in Asynchronous Stochastic Gradient Descent?ASGD?.The gradient delay will affect the model's convergence speed and accuracy,and in severe cases,it will cause the model to update significantly at a specific point,and even makes the model not converge,leads to the error in the entire training.In view of the difficulty of selecting learning rate in SGD serial computing,this paper gives an adaptive learning rate optimization algorithm based on convolutional neural network.In view of the problem of gradient delay in SGD parallel computing,this paper gives a gradient delay optimization algorithm based on convolutional neural network.Moreover,this paper performs experimental verification to the effectiveness of these two optimization algorithms.The main research contents of this paper are as follows:1.Give an adaptive learning rate optimization algorithm based on convolutional neural network.This paper analyzes the model parameter update formula in serial SGD and finds that the problem of selecting the learning rate is difficult in serial SGD.In view of this problem,this paper presents the ACADG algorithm,which is an adaptive learning rate algorithm to accelerate convergence.In the process of model iteration,ACADG algorithm are divided into two situations for discussion according to the positive and negative ofgt-1gt?where g is the gradient,t is the step?,and corresponding to the use of different algorithms to update the model parameters.Moreover,by comparing the Adam algorithm and the Amsgrad algorithm,it is found that the ACADG algorithm is the best in terms of model convergence,convergence speed,and accuracy.Therefore,ACADG can achieve the effect of adaptive adjusting the learning rate in the serial updating of model parameters.2.Give a gradient delay optimization algorithm based on convolutional neural network.This paper analyzes the model parameter update formula in the ASGD algorithm and finds that there is a gradient delay problem in the ASGD algorithm.In view of this problem,this paper presents the DASGD algorithm,which is a asynchronous stochastic gradient descent algorithm for dynamically adjusting the stale gradient.The basic idea of the DASGD algorithm is to dynamically calculate the weight of the gradient delay term and the momentum term according to the delay degree of the parameter gradient in each worker,so as to achieve the effect of dynamically adjusting the gradient delay.Moreover,by comparing the ASGD algorithm and the MDCASGD algorithm,it is found that the DASGD algorithm has a stronger processing capacity for gradient delay,and is the best optimization algorithm in terms of model accuracy,loss value,and convergence under high delay conditions.Therefore,DASGD can solve the problem of gradient delay in the asynchronous updating of model parameters.3.Give some comparative experiments to verify the effectiveness of the adaptive learning rate optimization algorithm ACADG and the gradient delay optimization algorithm DASGD.Aiming at the adaptive learning rate optimization algorithm ACADG,this paper gives three experiments of the ACADG algorithm on the synthetic loss function,the Mnist data set,and the Cifar10 data set,and the Adam algorithm and the Amsgrad algorithm are selected as comparison objects of the optimization algorithm.For the gradient delay optimization algorithm DASGD,this paper gives two experiments of the DASGD algorithm on the Cifar10 dataset and the Tiny-ImageNet dataset,and the ASGD algorithm and the MDCASGD algorithm are selected as the comparison objects of the optimization algorithm.According to the comparative experiments and the analysis of experimental results,the specific results are as follows:Compared with Adam algorithm and Amsgrad algorithm,the adaptive learning rate optimization algorithm ACADG given in this paper is the best in three aspects of convergence,convergence speed,and accuracy.And on the Mnist test data,the accuracy of the ACADG algorithm using the CNN model is 3.12%and 2.81%higher than that of the Amsgrad algorithm and the Adam algorithm,respectively;on the Cifar10 test data set,the accuracy of the ACADG algorithm using the CNN model is 15.59%and 1.99%higher than that of the Amsgrad algorithm and Adam algorithms,respectively.Compared with the ASGD algorithm and the MDCASGD algorithm,the gradient delay optimization algorithm DASGD given in this paper is the best in three aspects of accuracy,loss,and convergence under high latency.And on the Cifar10 test data set,the accuracy of the DASGD algorithm with the LeNet5 model is 3.736%higher than that of the MDCASGD algorithm when the number of workers is 12;on the Tiny-ImageNet test data set,the top?1 accuracy of the DASGD algorithm with the Vgg16 model is3.525%higher than that of the MDCASGD algorithm when the number of workers is 6.Therefore,the two stochastic gradient descent optimization algorithms given in this paper are effective in adjusting the parameters of the convolutional neural network model,which can not only optimize the convergent speed,accuracy,gradient delay and other related performance of the convolutional neural network model,but also effectively help the further application of deep learning in artificial intelligence.
Keywords/Search Tags:Deep learning, Convolutional Neural Network, Stochastic Gradient Descent, Adaptive learning rate, Gradient delay
PDF Full Text Request
Related items