Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training

Posted on:2020-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:C F Jia

Full Text:PDF

GTID:2428330578983122

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Artificial intelligence(AI)technology has become more and more mature after years of research.Based on Artificial Neural Network,the Deep Learning technology has become a research hotspot in the AI field due to its outstanding effects.With the development of Deep Learning,neural network models gradually become more com-plex,and the sample data used for network training is also growing rapidly.Due to the parametric characteristics of the neural network,the more data that is used for training and the more training iterations,the better the final training results.However,these will lead to a large increase in the amount of computation,which greatly extends the training time.Based on the multi-node CPU+GPU computing platform,this paper optimizes the design of distributed training runtime for deep learning framework,which improves the effect of large-scale deep learning training as well as the utilization of computing cluster resources,and finally reduces the training time.The main contents and results of this paper are:1.Migrating the open source deep learning framework TensorFlow from TCP/IP to RDMA implementation,improve the data transmission bandwidth between different nodes in a distributed environment.This paper firstly performs RDMA porting with the gRPC communication framework used by TensorFlow.Then another approach is taken,the data transmission part of TensorFlow is directly modified into the RDMA implementation.In the final test,the optimized TensorFlow can reach the max band-width that the hardware can support when transmitting large block data.Based on the engineering experience gained in the optimization process,this paper finally completed a set of independent RDMA communication framework,so that other applications with the same requirements can be quickly transplanted and optimized.2.A variety of optimization schemes are designed and implemented for distributed data parallel computing and communication modes,so that distributed deep learning training can be completed efficiently.This paper mainly uses the software pipeline scheme to cover the data delay caused by parameter synchronization,and further work improves the training speed of the neural network on the GPU card with the mixed pre-cision training scheme.Finally,the problem of the batch normalization method in the distributed environment is corrected.With a series of adjustments to the optimization algorithm and hyperparameters,the optimization scheme of this paper has been well verified in the ImageNet dataset.Some of the research reports and technical results of this article have been open sourced and have received the attention of many developers in the open source commu-nity.At the same time,the research conclusions of this paper will provide reference for the performance optimization of distributed deep learning training in the domestic processor environment.

Keywords/Search Tags:

Deep Learning, Distributed Training, Data Parallelism, RDMA

PDF Full Text Request

Related items

1	Optimization Of Distributed Training Strategies For Deep Learning Networks
2	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
3	Communication Optimization Technique For Distributed Synchronous Data Parallel Training
4	Image Classification Method Based On Deep Learning And Accelerated Training Technique
5	Parallel And Distributed Training Of Deep Learning
6	Use RDMA To Accelerate The Distributed Deep Learning
7	Research And Implementation Of Pipeline-based Distributed Deep Learning Training Optimization Technology In GPU Cluster Environment
8	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
9	On The Depth And Big Model Of Deep Neural Networks: Theory And Algorithm
10	Optimal Design And Implementation Of Distributed Deep Learning Training