The Research On Key Technologies Of DNN Hybrid Parallel Training

Posted on:2022-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Ye

Full Text:PDF

GTID:2558307169982259

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning technology has developed rapidly and has been widely used in many fields.With the continuous improvement of neural network model accuracy requirements,the amount of training data and model parameters continue to increase,re-sulting in a single computing device that can no longer meet the training of large-scale neural network models.This poses a huge challenge to the parallel mode and parallel ef-ficiency of intelligent computing.Traditional computing modes such as data parallelism and model parallelism can no longer meet the high-efficiency parallel computing require-ments of large-scale neural network training,and hybrid parallel training has become a current research hotspot.Although a large number of distributed parallel intelligent computing technologies have been proposed to improve the training efficiency of large-scale neural network mod-els,they still face huge challenges in controlling memory overhead,expansion efficiency,and programmability.On the one hand,the problem of excessive memory consumption restricts the computational efficiency of GPU devices.on the other hand,the communica-tion overhead prevents the training from being efficiently scaled on more nodes,restricting the training throughput.In order to evaluate the performance of parallel methods more reasonably,this paper proposes an index of memory efficiency(ME)to weight through-put and memory overhead.At the same time,most parallel methods still have certain technical obstacles when deploying distributed training,and require relatively high pro-gramming skills for researchers.Therefore,this paper researches on deep neural network hybrid parallel technology by focusing on the three aspects: memory efficiency,scaling efficiency,and programmability.The main work and innovations are as follows:Aiming at the problems of high memory overhead and poor scalability for distributed training,a hybrid parallel training scheme(HIPPIE)based on computing scheduling and communication scheduling is proposed.The scheme integrates pipeline parallel and data parallel modes,and designs computation scheduling algorithms and communication schedul-ing algorithms.The computation scheduling algorithm changes the relative order of the original forward computations and backward computations in the parallel method,re-leases related data in advance,and relieves the pressure on the memory.The communi-cation scheduling algorithm changes the relative order of the original communication and computation,and covers the gradient synchronization process with the computation pro-cess,which effectively improves the scaling efficiency.Compared with mainstream data parallel methods,this solution can increase throughput by up to 80%,while saving 57%of memory overhead,thereby obtaining 4.18× memory efficiency.In addition,on the 16-GPUs training platform,the scaling efficiency of this solution can always be maintained above 90%.Aiming at the technical obstacles of implementing hybrid parallel training,an auto-matic parallel technology of distributed training is proposed.The technology takes the memory efficiency as the optimization goal and designs an adaptive model partition al-gorithm,which automatically gives the optimal model partition scheme suitable for the HIPPIE method.In addition,the automatic parallel technology encapsulates processes such as device mapping and communication grouping,improving the simplicity of the de-ployment of the distributed parallel training method.The experiment mainly analyzes the division effect of the model division algorithm.Compared with the mainstream strategy of dividing equally by the parameter amount,the algorithm can increase the throughput by 42.5%,save 22.5% of the memory overhead,and increase the memory efficiency by83% at most.

Keywords/Search Tags:

Deep Learning, Distributed Training, Data parallelism, Pipeline parallelism, Hybrid parallelism

PDF Full Text Request

Related items

1	Optimization Of Memory And Communication For The Pipeline Parallelism
2	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
3	Optimization Of Distributed Training Strategies For Deep Learning Networks
4	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
5	Research On Acceleration Method Of Deep Convolutional Neural Networks Based On Hybrid Parallelism
6	Research On Deep Neural Network Acceleration Method Based On Data Parallelism
7	Research On Deep Learning Syntax Extension And Compilation Method Of COStream Language
8	Research And Implementation Of Pipeline-based Distributed Deep Learning Training Optimization Technology In GPU Cluster Environment
9	The Research And Implementation Of Parallel Computing Method On MPCore Multicore Processor
10	Research On Edge-oriented Hybrid Distributed Deep Learning Training Strategy