Research On Reproducible Parallel Algorithms And Algorithm Libraries

Posted on:2023-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:K He

Full Text:PDF

GTID:2568307169981179

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The existence of rounding errors in floating-point systems leads to the fact that floatingpoint operations do not satisfy the characteristics of the exchange law,and when the computation order is different,the computation results are not the same because of the different rounding errors generated.The multi-stage parallel structure and dynamic resource scheduling of modern computers aggravate the uncertainty of computation and further aggravate the frequency of non-reproducibility phenomenon.We combine error-free transformation technology,prerounding technology and multi-layer chunking technology to implement an efficient reproducible algorithm library based on the Open BLAS design architecture for domestic processor platforms,including the reproducible basic linear algebra function library FT-Repro BLAS and the trusted reductive function library MPI＿ACCU＿REDUCE.FT-Repro BLAS software library mainly contains three parts: multi-layer chunked reproducible summation algorithm(and the reproducible dot product and parametric functions are implemented on the basis of the summation algorithm),multi-layer chunked dot product algorithm with mixed accuracy,and multi-threaded reproducible DGEMV algorithm.The multi-stage parallel structure including SIMD,Open MP and MPI is designed on the basis of the multi-layer chunked reproducible summation algorithm,and tested on three different ARM platforms,and the multi-layer chunked reproducible summation algorithm can achieve a speedup ratio of 3.5-5 times compared with the mainstream Repro BLAS software library.The mixed-precision multilayer chunked dot product algorithm enriches the computational operations of FT-Repro BLAS.There are two variants of the algorithm,which ensure the efficiency and accuracy of the algorithm by applying different computational precision inside and outside the chunk,thus taking advantage of the low-precision computational power.The multi-threaded reproducible DGEMV function achieves a speedup ratio of at least two times compared to the DGEMV algorithm in Repro BLAS,and when compared to the reproducible DGEMV function in Oz BLAS,the algorithm achieves a speedup ratio of more than 20 in the single-threaded case.MPI＿REDUCE is one of the most commonly used global reversion operations in MPI,and its role is to implement global reversion operations for all members of the process group.The trusted imputation function library MPI＿ACCU＿REDUCE contains five global imputation operations: high-precision summation,high-precision product,highprecision l2 parametrization,reproducible summation,and reproducible exact summation,and the imputation operations are bound to the corresponding imputation operators by the MPI＿Op＿create function,which provides a reliable computational tool and is of practical importance to the field of massively parallel computing.The research is of practical significance in the field of massively parallel computing.Therefore,the main work of this paper is:1.In view of the non-reproducibility of floating-point computation results,a more ac- curate and efficient multi-layer chunking reproducible algorithm is designed,and an efficient reproducible linear algebra function library FT-Repro BLAS is imple- mented on the domestic processor platform.The function library mainly contains three parts: multi-layer chunking reproducible summation algorithm,multi-layer chunking dot product algorithm with mixed accuracy,and multi-threaded repro- ducible DGEMV algorithm,which provides an effective tool for solving It provides an effective tool to solve the problem of non-reproducible computation results in large-scale scientific computation.2.In response to the unreliability of the global imputation function MPI＿REDUCE in MPI,we design and implement the imputation function library MPI＿ACCU＿REDUCE, which contains five trusted imputation operations,and the corresponding imputa- tion operations can be invoked by using the corresponding imputation operators, which improves the accuracy and reliability of the imputation results.

Keywords/Search Tags:

Rounding error, reproducibility, multilayer chunking, MPI＿REDUCE, Reliable

PDF Full Text Request

Related items

1	The Study Of The View About The Culture Communication In 'Arts In The Age Of Its Technological Reproducibility'
2	The geometric rounding: Theory and applications
3	Error Control Algorithms and Architectures for Reliable DSP Systems
4	Reproducibility Measurement Based On Adaptive Hybrid Copula And Its Application In High - Throughput Depth Sequencing
5	A study of the reproducibility and repeatability of dynamic mechanical tests on polymers and metals
6	Quantitive Error Analysis Method For Complex Computation Process And Application
7	Pre-registration:Reproducibility Crisis And The Journal Editing System Revolution
8	Le chunking perceptif de la parole: Sur la nature du groupement temporel et son effet sur la memoire immediate
9	Research On Error Analysis Of Floating- Point Program And Its Implementation
10	Research On Adaptive Unequal Error Protection And Reliable Transmission Of Video Data