Font Size: a A A

Research On Sparse Matrix Multiplication Acceleration Technology For Sunway Architecture

Posted on:2024-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:J PengFull Text:PDF
GTID:2568307127954989Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and artificial intelligence technology,the demand for high-performance computing is becoming more and more in academia and industry,which greatly promotes the development of high-performance computing technology in emerging fields.The heterogeneous parallel architecture has become the mainstream architecture in the field of high-performance computing,and the research of basic algorithms for highperformance computing platforms based on heterogenous parallel architecture has become one of the research hotspots in the field of high-performance computing.As the core algorithm of in various high-performance computing fields,sparse matrix multiplication operations have become a key technology in many scientific computing and engineering research fields,and its computing efficiency directly affects the performance of the whole application.Sparse matrix multiplication is a typical irregular operation with low ratio of computing and memory access and irregular access,which makes it more difficult to achieve parallel acceleration on heterogenous high-performance architectures than dense matrix multiplication.In view of this,this paper proposes an efficient parallel acceleration algorithm design for sparse matrix multiplication operations using SW26010 P many-core processor as the computing platform for domestic heterogeneous high-performance processor architecture.In summary,the main work of this paper are as follows.(1)For sparse matrix-vector multiplication(Sp MV),this paper proposed a parallel acceleration Sp MV algorithm based on SW26010 P processor architecture.The HYB storage format is used for the sparse matrix in this parallel acceleration algorithm.Aiming at the difficulty of threshold selection in HYB format,this paper proposes a new threshold selection method,called the multi-iteration OTUS method,with the help of the idea of OTSU algorithm.Therefore,the sparse matrix is divided into COO part and ELL part.For the COO part which is not suitable for parallelization,the Sp MV is performed in serial on the MPE,and for the ELL part which is suitable for parallelization,the Sp MV is performed in parallel on the CPEs.According to the architecture characteristics of the hardware platform,two performance optimization schemes are adopted to improve the memory access bandwidth and computing performance of the ELL part.The experimental results show that under the single CG,our acceleration algorithm can achieve an average speedup of 23.36 and the best speedup of 34.85,compared with the sequential method on the MPE.By using DMA transmission bandwidth optimization technology,the performance is improved by 5.93% on average,and by using double buffer mechanism,the performance is improved by 16.30% on average.(2)For sparse matrix-sparse vector multiplication(Sp MSp V),this paper proposed a parallel acceleration Sp MSp V based on SW26010 P processor architecture.Aiming at the redundant data in Sp MSp V operation,this paper performs data pre-processing operation on both the original sparse matrix and sparse vector to remove unnecessary data.In order to solve the problem of load imbalance among CPEs caused by inappropriate sparse matrix data partition,this paper proposes a new load-balanced data partition strategy to make the load balance among CPEs as much as possible.Finally,a parallel design of Sp MSp V operation after data pre-processing and load-balanced strategy is proposed.The experimental results show that under the single CG,our acceleration algorithm achieves good acceleration effect and better scalability when the input sparse vector has high sparsity.(3)For sparse-dense matrix-matrix multiplication(Sp MM),this paper proposed a parallel acceleration Sp MM based on SW26010 P processor architecture.Aiming at the limitation of Sp MM computing scale caused by the limited capacity of LDM,this paper proposes a data partition strategy combined with the memory mode of SW26010 P processor.According to the data partition strategy,a parallel design of Sp MM based on CSR format is proposed.In the parallel design,the result matrix is divided into multiple sub-matrix blocks by row,and the input sparse matrix is divided into multiple sub-matrix blocks accordingly.Then,the multiplication of the input sparse sub-matrix block and the dense matrix is calculated by column block.The experimental results show that under the single CG,our acceleration algorithm can achieve an average speedup of 28.22 and 21.99,compared with the sequential method on the MPE when the number of columns of the dense matrix is 32 and128,respectively.
Keywords/Search Tags:Sparse matrix multiplication operations, High-performance computing, Parallel computing, Sunway many-core processor, Algorithm acceleration
PDF Full Text Request
Related items