Research Of Optimization Method For GPU-based Multifrontal Method In Sparse Cholesky Factorization

Posted on:2016-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:W Wang

Full Text:PDF

GTID:2310330479453390

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In varieties of scientific computing and engineering applications, it is the all-important component to solve large sparse systems of linear equations. Cholesky factorization is widely used to solve sparse linear equations for its high performance, accuracy. Over the past decades, many researchers had sought to factorize sparse matrix in CPU cluster to reduce the overall computing time. With the rapid development of computational power of Graphics Processing Unit(GPU), there are several solutions to accelerate sparse direct solvers on GPUs. The approaches involve off-loading large dense operations to GPU. However the approaches can not sufficient utilize GPU computing resources because of current GPU programming paradigm.In order to solve above problem, GPU-based implementation of sparse cholesky factorization is proposed based on multifrontal method. A large sparse coefficient matrix is decomposed into a series of small dense matrices in the method, GEMMs consume most of computation time in the factorization of dense matrices, but they are hardly able to be performed better in parallel on GPU. Three optimization strategies are proposed to accelerate the performance. Multiple task queues scheme is adopted to perform multiple GEMM operations in parallel, which can guarantee that the overhead of data transfer and the kernel computation from multiple frontal matrices be overlapped. A threshold is set to decide the platform of the GEMM operation, i.e. CPU or GPU, specifically, if the calculated quantity is larger than the threshold, then the operation is offloaded to GPU, otherwise, it will be processed on CPU. Multiple thread blocks is combined to perform one GEMM operation. In order to reduce the computation time, the procedure of GEMM operation is improved.Based on Linux operating system and CUDA programming environment, GPU-based multifrontal method in sparse cholesky factorization is implemented. There are four testing schemes to be performed on six group testing matrices. Experimental results show that, compared with multithreaded cholesky factorization on CPU, our GPU implementation based on above optimization mechanisms can achieve up to 3.15� speedup. Meanwhile, compared with the implementation on GPU, it can achieve up to 1.98� speedup. Above three optimization mechanisms are used in power flow computation, the performance has a significant improvement.

Keywords/Search Tags:

Cholesky Factorization, Multifrontal Method, Multiple Task Queues Scheme, Task Allocation, Graphics Processing Unit

PDF Full Text Request

Related items

1	Research On Ships Collaborative Multiple Task Planning Method
2	Research On Multi-UAV Task Allocation And Path Planning Based On Graph Partition
3	Research Of Task Allocation Problem Based On Spatial And Social Distance
4	Research On Multiple Task Granularities Oriented Parallel Computing Technology For Remote Sensing Image Mosaics
5	Research On Task Allocation Method Based On Coalition Formation Game
6	Research On Carrier Selection And Task Assignment Model With Considering Carbon Emission
7	Application Of Multi-task Learning Method Based On Lasso In The Stellar Parametrization
8	Research On Collaboration And Task Allocation Of Self-interested Agents In Coalitional Skill Games
9	Tree-Structured Task Allocation Algorithm And Optimization Via Group Multirole Assignment
10	Research On Task Allocation Algorithm Of Crowdsourcing Platform Based On Game Theory