Font Size: a A A

Optimization Method Of Tensor Mathematical Operations

Posted on:2024-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:S W FanFull Text:PDF
GTID:2530307052496154Subject:Electronic information
Abstract/Summary:PDF Full Text Request
As a typical representative of the great power,supercomputers play an increasingly important role in weather forecast,intelligent education,aerospace,biotechnology,etc.Tensor mathematical operations are used to perform basic arithmetic calculation and elementary function operation element-wisely on multidimensional tensor data.They are an important part of high-performance applications and have a broad and important impact on the performance of applications.However,due to the unique heterogeneous architecture and software environment of the new-generation Sunway processors,it is difficult to make full use of computing resources for tensor mathematical operations in high-performance applications to achieve better computing performance.According to the heterogeneous architecture characteristics of the new-generation Sunway processors and the computing characteristics of tenser mathematical operations,scalar-oriented elementary function optimization and tensor-oriented parallel optimization are effective methods to improve performance.In addition,applications in different application fields have different requirements on the accuracy of calculation results.Higher computational accuracy is usually at the expense of computational performance.In order to improve program performance,the accuracy of computational results only needs to meet the accuracy requirements of the application.Therefore,mixed-precision optimization is also an effective way to improve performance.However,due to the complex computation process of the program and the huge search space of the mixed-precision optimization scheme,it is difficult to achieve sufficient optimization of program performance under the condition that the program computation results meet the given accuracy requirements.This paper proposes an optimization method for tensor mathematical operations based on the new-generation Sunway processor.The method considers the multi-level parallel architecture and memory access characteristics of the new-generation Sunway processor,and realizes efficient parallel computing of tensor mathematical operations.Considering the computational characteristics of tensor mathematical operations,the vector elementary functions applied to the new-generation Sunway processor are realized.On the condition of meeting the accuracy requirements of tensor mathematical operations,the floating-point variables and elementary functions with low precision are used in order to save the running time and memory overhead.Specifically,the main contributions of this paper are summarized as follows:(1)Multi-level parallel optimization method based on the new-generation Sunway processor: According to the heterogeneous architecture of the new-generation Sunway processor,a multi-level parallel optimization method is designed to make full use of the computing resources of each core of the processor.Firstly,based on the programming model of the new-generation Sunway processor,a parallel optimization method based on MPE and CPE is designed to support process-level parallelism and thread-level parallelism.Secondly,in order to access data efficiently,methods such as DMA-based data transfer and memory access time hiding are designed.Finally,the tensor mathematical operation can efficiently run on the new-generation Sunway processor.(2)Elementary function optimization method: Aiming at the computational characteristics of tensor mathematical operations and the architecture characteristics of the new-generation Sunway processor,an elementary function optimization method is designed.Based on the SIMD of the new-generation Sunway processor and the variable precision elementary function automatic generator TGen,vector elementary functions with multiple precision versions are implemented.Based on the LDM space of the CPE,a data table preloading method for elementary functions is designed to improve the efficiency of data table memory access.Finally,high performance vector elementary functions apply to tensor mathematical operations and the new-generation Sunway processors are realized.(3)Mixed-precision optimization method: Based on the static precision tuning tool Daisy,a precision tuning method with dynamic and static combination is designed.The static precision tuning is realized based on Daisy and the computation precision of floating-point variables and elementary functions is dynamically adjusted based on the global error allocation method.Finally,the optimal implementation scheme is selected from the dynamic adjustment schemes to realize the mixed-precision optimization of floating-point variables and elementary functions.In this paper,the effectiveness of the optimization method is verified on the typical operators of CORPIN,Rosa and the deep learning model.The experimental results show that compared with the elementary functions in the basic mathematical function library of the new-generation Sunway processor,the optimized elementary functions can achieve an average acceleration effect of about 4.53 times.Compared with the single-core running version,the optimized tensor mathematical operation can achieve an average speedup about 112.19 times in a core group,and an average speedup about 138.10 times and parallel efficiency about 128.83% in two core groups.
Keywords/Search Tags:New-generation Sunway processor, Tensor mathematical operation, Elementary function, Mixed-precision
PDF Full Text Request
Related items