Font Size: a A A

Explorations And Implementations For Improving Deep Learning By Numerical Methods

Posted on:2024-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:W D LuoFull Text:PDF
GTID:2558307079493094Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In deep learning,input data flows through a neural network,and the neural network produces output,which means the neural network can be seen as a dynamic system.Ordinary differential equation(ODE)can be used to describe the behavior of dynamic systems.This suggests a connection between deep learning and ODEs,and theoretical foundations from the ODE field can be used to analyze problems in deep learning.Based on this,this thesis focuses on the connection between deep learning and ODEs and researches optimizers and neural network structures in deep learning.Specifically,the thesis focuses on the following three aspects.Firstly,the optimizer has an essential impact on the accuracy,robustness,and generalization of neural networks.The study of optimizers is a hot topic in neural networks.This thesis attempts to improve the stochastic gradient descent(SGD)optimizer using three-order Lagrange-type discrete formulas.From the perspective of numerical methods,SGD can be understood as the discrete form of the Euler forward method.Considering the large truncation error of the Euler forward method,a three-order Lagrange-type discrete formula with higher accuracy is used to improve the SGD optimizer,and the Lagrange-type stochastic gradient descent(LSGD)optimizer is proposed.Then,the performance of LSGD is verified in benchmark experiments.Experimental results show that LSGD optimizer does not converge.Finally,the reasons why LSGD can not converge are analyzed using zero stability and consistency,and the experimental results are explained,laying the foundations for the next chapter.Subsequently,based on the research content of the second chapter,this thesis proposes a high-order stochastic gradient descent(HSGD)optimizer by using highorder discrete formulas that meet zero stability and consistency to improve the SGD optimizer.The convergence of HSGD is mathematically proved.Then,the performance of the HSGD optimizer is evaluated on text classification and image recognition tasks.Experimental results show that compared with SGD,HSGD has higher performance.The performance gains obtained by HSGD upon SGD verify the feasibility and superiority of improving the optimizer from the perspective of numerical methods.Finally,according to the relationship between neural network structure and the discretization of ODE,this thesis analyzes the traditional recurrent neural network(RNN)network structure from the perspective of numerical methods and believes that it has a similar iterative form to the Euler forward method.With this relation,the Taylor-type recurrent neural network(T-RNN)model is proposed based on a third-order Taylor-type discrete method with higher accuracy.T-RNN’s performance improvement over RNN is verified in multiple natural language processing(NLP)tasks such as sentiment classification,text classification,and statistical language models.In addition,according to experimental phenomena that occur in deep learning experiments,this thesis conducts numerical experiments to analyze the characteristics of discrete formulas,further confirming the relationship between neural networks and ODE.
Keywords/Search Tags:Deep learning, network structure, optimizer, ordinary differential equation, recurrent neural network
PDF Full Text Request
Related items