Research On Training Acceleration Of Transformer Model On New Generation Sunway Supercomputer

Posted on:2024-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2558307070951739

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence technology,the transformer model has been widely used in machine translation,image analysis,human-computer dialogue and other fields.However,neural network models built using transformer usually has a huge number of model parameters and complex network structure,which leads to a significant increases the demand for computing resources in model training.How to train the transformer model more efficiently has become a problem that academia and industry have to face The problem.Supercomputers have powerful computing and storage capabilities,which can well meet the computing resource requirements of Transformer model training.Therefore,using supercomputers to accelerate Transformer model training has become a popular direction in the field of model acceleration.With the development of domestic supercomputers,a new generation of Sunway supercomputer was launched in 2021.The SW26010 Pro multicore processor equipped with them is independently developed by a Chinese scientific research team,which can provide a solid computing power foundation for domestic supercomputers.However,the hardware architecture of the new generation of Sunway supercomputer is different from the widely used GPU platform,which makes the existing open-source libraries and model acceleration methods unable to be directly applied to Sunway supercomputer.Therefore,how to design a suitable acceleration method for transformer model training on the hardware architecture characteristics of the new generation Sunway supercomputer is an urgent problem to be solved.To solve this problem,this paper mainly does the following work:1.This paper proposes a multi-grained operator fusion method based on the length of the input sequence,which includes coarse-grained operator fusion and finegrained operator fusion.The algorithm combines the memory management characteristics of the SW26010 Pro processor,uses operator fusion to reduce the amount of main memory access,and adopts operator fusion methods of different granularities for different input sequence lengths,thereby reducing redundant access to the main memory.In addition,this algorithm also gives a calculation method to guarantee the numerical stability of the model,and proves that the new calculation method is equivalent to the standard calculation method.2.This paper proposes a mixed-precision training method based on the division of transformer model layers,using different data formats for different layers,thereby obtaining a mixed-precision training strategy based on the model layer,and on this basis,a layer-by-layer loss dynamic scaling algorithm is proposed to solve the problem.The problem of model precision overflow,the experimental results show that this algorithm can effectively solve the problem of poor model convergence under the mixed strategy.3.This paper proposes a parallel optimizer based on the new generation of Sunway supercomputing.First,it combines the multi-core group hardware structure of the SW26010-Pro processor to optimize the data in parallel.A parallel optimizer method that uses weight decay combined with a ring ensemble communication strategy for gradient-synchronized updates.For each optimization method,this paper designs a large number of control experiments.The final experiments show that the transformer model training acceleration technology based on the new generation of Sunway supercomputers proposed in this paper can improve the model training speed,which is conducive to further promoting domestic supercomputers.Ecological development integrated with artificial intelligence technology.

Keywords/Search Tags:

deep learning, Sunway supercomputer, transformer model, model acceleration, operator fusion

PDF Full Text Request

Related items

1	Research On Parallel Optimization Of Transformer Model Based On The New Generation Of Sunway Many-core Processors
2	Parallel Optimization Research Of Method Of Characteristics Based On Sunway Bluelight Ⅱ Supercomputer
3	Parallel Deep Learning Training System On Sunway TaihuLight
4	Implementation And Optimization Of Molecular Dynamics Application On Sunway Taihulight Supercomputer
5	Porting And Optimization Of OpenFOAM On The Sunway Taihulight Supercomputer
6	Optimization Of Molecular Dynamics Algorithms Based On The Sunway TaihuLight Supercomputer
7	The Design And Optimization Of High-performance Molecular Dynamics Algorithms On The Sunway TaihuLight Supercomputer
8	Implementation And Optimization Of Tensor Library Based On Sunway Domestic Supercomputer Platform
9	Research On Directive-based Parallel Language For Sunway Taihulight Supercomputer And Design Of The Compiling Optimization
10	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network