Font Size: a A A

GPU Preconditioning Algorithm Based On Block Structured Incomplete Sparse Approximation Inverse

Posted on:2022-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y W HuFull Text:PDF
GTID:2480306749978409Subject:Biology
Abstract/Summary:PDF Full Text Request
How to construct an effective preconditioner to accelerate the convergence of solving large-scale sparse linear systems has always been one of the research hotspots in the field of numerical computing.As the scale of linear systems increases,the use of imprecise preprocessing methods to solve large-scale sparse linear systems has gradually begun to attract attention.Using this method can trade off time and accuracy.In recent years,with the rapid development of GPU,the use of imprecise methods based on GPU to construct preconditioners for solving large-scale sparse linear systems has achieved fruitful results.At present,the research on preconditioner construction focuses on scalar matrices,but in practical engineering applications,such as multiphysics problems,coefficient matrices often appear in the form of block matrices,although block-structured linear systems Most of the numerical algorithms are derived from scalar linear systems,but there are still large differences in implementation strategies and performance tuning.Therefore,this paper discusses and researches a precondition algorithm for rapidly constructing an efficient block incomplete approximate sparse inverse for the heterogeneous computing system of CPU+GPU.The main work is as follows:(1)Based on the Incomplete Sparse Approximate Inverses(ISAI)algorithm,a block-structured Block Incomplete Sparse Approximate Inverses(Block-ISAI)preconditioning algorithm is proposed and accelerated on GPU Efficient implementation strategies on the platform.In terms of preprocessing,to avoid using strong data-dependent backward-substitution and forward-substitution in the preconditioning step to solving the block triangular linear system,the preconditioning method based on Block-ILU decomposition uses the Block-ISAI algorithm to transform the preconditioning step is a highly parallelizable matrix-vector multiply operation.In terms of GPU implementation,the overall sparse and locally dense characteristics of block matrices are fully considered and combined with optimization techniques such as fusion memory access and shared memory on the GPU platform,based on the basic scheduling unit warp in CUDA,a warp is proposed is proposed.The strategy to allocate thread tasks can ensure that each warp calculates each column of the approximate inverse of the upper and lower triangular factors in block units in parallel,so as to give full play to the fine-grained and high-concurrency computing characteristics of GPU.(2)In terms of the solution efficiency of the preprocessing step and the overall solution time of GMRES based on the Krylov subspace method,Combining the Block-ISAI precondition algorithm proposed in this paper with the precondition algorithm for block triangular linear solution based on cu SPARSE,which were compared and analyzed in detail on a heterogeneous computing platform composed of Intel E5-2640 V4@2.40 GHz processor and NVIDIA's Tesla V100 GPU.After selecting several typical matrices from the Suite Sparse matrix collection for testing,the results show that although the Block-ISAI algorithm proposed in this paper increases the number of iteration steps,it greatly shortens the preprocessing time in the iteration.The total solution time based on the Block-ISAI algorithm is less than that based on the cu SPARSE preconditioning algorithm,and the speedup ratio is 1.19?6.69.
Keywords/Search Tags:Incomplete Approximation Sparse Inverse, Preconditions, Block Linear Systems, GPU, Warp
PDF Full Text Request
Related items