Font Size: a A A

Design And Optimization Of Parallel For MTTKRP Of Sparse Tensor On CPU And GPU

Posted on:2022-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:R HuFull Text:PDF
GTID:2480306731979859Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information acquisition technology,the real data presents a development trend of multi-dimensional high sparseness.Traditional matrix decomposition,which transforms high-dimensional data into matrix,can not meet the requirements of big data era.Sparse tensors appear in many large-scale applications with multi-dimensional and sparse data.Tensor decomposition is an extension of matrix decomposition in high-dimensional spaces.Different from matrix decomposition,tensor decomposition extracts the inherent low-dimensional structural features of the data while retaining the hidden connection of the data in the highdimensional space.Matricized-Tensor Times Khatri-Rao Product(MTTKRP)is the basic operation in tensor decomposition,especially in CANDECOMP/PARAFAC(CP)decomposition.CP decomposition decomposes a tensor into several simpler tensors with simpler properties,which are usually accelerated by Alternating least squares(ALS).In the ALS algorithm for solving large sparse tensors,the main calculation amount is the MTTKRP,so efficient calculation of MTTKRP is the core and important link to improve tensor decomposition.With the rapid development of GPU in recent years,its unique parallel architecture is suitable for intensive and highly parallel computing.However,the irregular calculation mode and sparse structure,as well as the large memory footprint of sparse tensor operations,make sparse tensor calculations very challenging.This article starts with MTTKRP in accelerated tensor operations,based on parallel technology.First,aiming at the high-dimensional sparseness of real data,compressed formats are adopted for storage to improve memory utilization.Then design the CPU parallel implementation algorithm for different compression formats.Finally,combined with the parallel mode and storage characteristics of GPU,the MTTKRP parallel algorithm of GPU is designed to optimize the thread execution mode and improve the parallel efficiency of the algorithm.The main work of this paper is as follows:Aiming at the high-dimensional sparsity of tensors,different compressed formats of sparse tensors are analyzed.The parallel algorithm of MTTKRP with different sparse formats is designed on the CPU platform to reduce the calculation of the empty section and the empty fiber in the sparse tensor.Considering the relationship between the index of fibers and slices and the factor matrix,this paper decomposes the calculation process into a vector inner product operation,that is,the calculation of fiber and matrix columns,reducing the number of iterations and redundant calculations of MTTKRP.Experiments have proved that the optimal speedup of 3.67× can be achieved on the CPU for the Movielens dataset.Design GPU parallel algorithms based on hierarchical tensor storage format.First,the sparse tensors are grouped according to the slices,and the slice-level parallel MTTKRP is proposed.Secondly,the MTTKRP of coalesced storage is proposed to realize coalesced memory access and data reuse,and realize the parallelism between fibers.After that,shared memory is used to store the intermediate results of the slices to reduce memory access latency.Finally,the warps execution pattern is designed to improve the parallelism of the algorithm,and the result matrix is obtained by row through interleaved reduction.For the MTTKRP optimization algorithm in the tensor operation proposed in this article,the public data set is used to test.These experimental results show that the optimal strategies achieve 1.59× average speedup and achieve up to 3.58× speedup on GPU.
Keywords/Search Tags:MTTKRP, Parallel Computing, Sparse Tensor, Tensor Decomposition
PDF Full Text Request
Related items