A Research In Parallel Training On Heterogeneous Devices For Deep Neural Networks

Posted on:2023-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:T Su

Full Text:PDF

GTID:2568307058499474

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Deep Neural Networks(DNNs)training has great demand for device compute capacity and memory capacity.Single Device Training is hard to satisfy the training demand of Deep Neural Network training.Parallel training separates the models and the data through a split strategy and distributes them on different devices to satisfy the demand for training and accelerate the training process.However,the complexity of DNN and the large variety of parallel strategy make the optimal parallel strategy hard to find.This problem becomes worse in heterogeneous scenarios.The parallel strategy requires more finetuning efforts because of the imbalance in compute capacity of the heterogeneous devices.This makes model developers take more time to tune on hardware split groups instead of focusing on model development.Automatic parallel training tools have been proposed to separate the model development process and ignore details of underlying parallel training details.Automatic parallel training tools only need small efforts from the user: the user specifies a critical split point,and the tools will automatically search for the optimal parallel strategy.However,existing automatic tools focus on the specific models,making migrating to the new models a non-trivial task.In addition,some works do not take a general description.Thus,it would be difficult for framework developers to migrate one tool to another.This thesis proposes an automatic parallel training library for DNNs.Its foci are usability and portability.From the usability side,it plays as an optimizing graph pass.Users only need to use a unified interface to annotate the critical tensors.Then the tools will automatically transform the user-given single device graph into a multi-device graph for parallel training.Most importantly,this work proposes a unified interface description from the portability side.The user could only use one interface to describe all common model parallel training strategies.The unified interface makes migrating one split algorithm to another an easy task.This work transforms a framework-dependent computation graph into a framework-independent Intermediate Representation(IR)at the backend.It uses this IR to do parallel training.This makes it easy for the framework developers to port algorithms to this library.Because they only have to focus on the conversion between IRs.To provide a unified description of the split algorithm for framework developers,this thesis defines the split strategy as a unified property description.Then,it defines the property propagation process on the graph nodes according to the dim mapping of the tensors.Property propagation makes the user-annotated split strategy could propagate through the computing graph,follow the same rule,and set other nodes automatically.This makes the framework developers could use one property propagation logic to accelerate the parallel training.At last,to support heterogeneous devices,this thesis proposes a simple cost model to accelerate the parallel training.In practice,the library has good acceleration performance and extension efficiency,and it can provide 94% extension efficiency and 3.77 x acceleration ratio on the Res Net-50 model on 4 GPUs.

Keywords/Search Tags:

Deep Neural Network, Parallel Training, High-Performance Computing, MLIR, Graph Optimization

PDF Full Text Request

Related items

1	Research On Deep Neural Network Training Acceleration Strategies With Data Parallelization
2	Research On Key Technologies For High-Performance Parallel Training Of Large-Scale Deep Learning
3	Parallel Deep Learning Training System On Sunway TaihuLight
4	Optimizations For Data Path In Parallel And Distributed Neural Network Training
5	Research On Parallel Computing Architecture Of Multiple CNN Models On FPGA
6	Performance Heterogeneity-Oriented Convolution Neural Network Parallel Optimization
7	EM-Based Parallel Neural Network Modeling Technique
8	Research On Neural Network Training Algorithms For Artificial Intelligent Chips In Edge Computing
9	Research On Training And Optimization Methods Of Deep Neural Networks
10	Research Of High Performance Evolutionary Algorithm Based On Distributed Parallel Computing