Font Size: a A A

Research Of Optimization Method For GPU-based Multifrontal Method In Sparse Cholesky Factorization

Posted on:2016-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2310330479453390Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In varieties of scientific computing and engineering applications, it is the all-important component to solve large sparse systems of linear equations. Cholesky factorization is widely used to solve sparse linear equations for its high performance, accuracy. Over the past decades, many researchers had sought to factorize sparse matrix in CPU cluster to reduce the overall computing time. With the rapid development of computational power of Graphics Processing Unit(GPU), there are several solutions to accelerate sparse direct solvers on GPUs. The approaches involve off-loading large dense operations to GPU. However the approaches can not sufficient utilize GPU computing resources because of current GPU programming paradigm.In order to solve above problem, GPU-based implementation of sparse cholesky factorization is proposed based on multifrontal method. A large sparse coefficient matrix is decomposed into a series of small dense matrices in the method, GEMMs consume most of computation time in the factorization of dense matrices, but they are hardly able to be performed better in parallel on GPU. Three optimization strategies are proposed to accelerate the performance. Multiple task queues scheme is adopted to perform multiple GEMM operations in parallel, which can guarantee that the overhead of data transfer and the kernel computation from multiple frontal matrices be overlapped. A threshold is set to decide the platform of the GEMM operation, i.e. CPU or GPU, specifically, if the calculated quantity is larger than the threshold, then the operation is offloaded to GPU, otherwise, it will be processed on CPU. Multiple thread blocks is combined to perform one GEMM operation. In order to reduce the computation time, the procedure of GEMM operation is improved.Based on Linux operating system and CUDA programming environment, GPU-based multifrontal method in sparse cholesky factorization is implemented. There are four testing schemes to be performed on six group testing matrices. Experimental results show that, compared with multithreaded cholesky factorization on CPU, our GPU implementation based on above optimization mechanisms can achieve up to 3.15× speedup. Meanwhile, compared with the implementation on GPU, it can achieve up to 1.98× speedup. Above three optimization mechanisms are used in power flow computation, the performance has a significant improvement.
Keywords/Search Tags:Cholesky Factorization, Multifrontal Method, Multiple Task Queues Scheme, Task Allocation, Graphics Processing Unit
PDF Full Text Request
Related items