| Image reconstruction is an important method in the field of image processing.It can invert the sample observation information in the imaging system and reconstruct the original sample image.Over the past three decades,researchers have proposed image reconstruction models such as deconvolution,total variation,and compressed sensing,which have been widely used in medical imaging,image denoising,deblurring,and super-resolution reconstruction.In recent years,the Deep Neural Network(DNN)model has achieved remarkable results in image classification and speech recognition,which has stimulated researchers’ enthusiasm for DNN model.The DNN model for image reconstruction has gradually been a hotspot of research and the designed DNN model improves the performance of image reconstruction.Graphics Processing Unit(GPU)is a hardware architecture capable of processing massively parallel computing.It is widely used in the training and inference of DNN models,which effectively accelerates the implementation of DNN.With the rapid development of GPU embedded devices,research have implemented DNN models for classification recognition using embedded devices in recent years.However,unlike the DNN model of classification,in which the feature dimension of the DNN model is reduced layer by layer,while the feature dimension of the DNN model of image reconstruction may be equal to or larger than the input dimension,which will result in a significant increase in the amount of calculation.At the same time,most of the image reconstruction DNN models such as U-Net and Res Net have the characteristics of network asymmetry,resulting in uneven distribution in calculation and unable to implement parallel design.Therefore,it is difficult to have a fast implementation on GPU embedded devices that has great requirements for computational balance.For the problem of network structure asymmetry of DNN model for image reconstruction,this paper is based on the idea of graph signal processing,using the computational equilibrium distribution characteristics of graph structure,modifying the structure of DNN model,making model calculation equalization and improving GPU resource utilization.In order to verify the effectiveness of graph signal processing,the experiment uses the Automated Transform by Manifold Approximation(AUTOMAP)model as the research object,and replaces the traditional convolution and deconvolution operations with graph convolution.The AUTOMAP model based on graph signal processing and the original AUTOMAP model converge to similar loss values by comparison in the experiment.For the purpose of accelerating the DNN model,based on the GPU-based CUDA programming model,this paper implements parallelization acceleration design for the optimized AUTOMAP model.By using the cu BLAS library to achieve parallelization acceleration for the fully connected layer.By using the sparseness of Laplacian matrix,the storage with ELLPACK-R format is proposed,which saves storage space and uses load-balanced sparse matrix vector multiplication parallel optimization strategy to improve GPU execution efficiency.According to the characteristics of the Chebyshev polynomial iteration,the parallel acceleration strategy is realized by the idea of base vector.For the large matrix multiplication,the idea of blocking and the GPU shared memory is adopted to quickly realized.The experiment model runs on the Jetson AGX Xavier embedded platform of NVIDIA.Through the parallel optimization design,implementing DNN inference model for image reconstruction on embedded platform,making full use of storage and computing resources to improve the computational efficiency of the model. |