| The JPEG2000 is a still image compression standard with discrete wavelet transform and optimized truncated embedded coding.Because of its superior low-bit rate compression performance,progressive transmission and interesting coding advantages,it has become the mainstream algorithm for remote sensing image compression.However,the JPEG2000 standard has high algorithm complexity and slow processing speed.The research on the JPEG2000 codec system has great practical significance.On-board compression system generally uses aerospace-grade encoding chips,limited by harsh conditions such as ultralow temperature,high radiation,and low energy consumption on the satellite.The terrestrial decoder has fewer restrictions,compared with the satellite encoder,and there are decoding chips,CPUs,clusters,and multiple decoding schemes to choose from.However,the decoding chip development scheme has disadvantages such as long development cycle,high cost and low reuse rate.The traditional CPU decoding scheme has poor calculation performance and is difficult to meet the decoding speed requirements.The CPU cluster decoding solution requires a separate computer room and huge power consumption,which is expensive to maintain.The graphics processing unit can well solve the problems of computing power,power consumption,cost and reuse rate,and provides a new solution for JPEG2000 decoding.Based on the engineering design of a satellite decoding system in China,this thesis studies the JPEG2000 decoding process combined with the characteristics of GPU.This thesis firstly introduces the JPEG2000 image compression standard process,GPU architecture development process and CUDA programming method,then parallel design of DWT,Tier-1,quantization and post-processing modules,and finally optimization of data transmission between modules and CPU-GPU collaborative work.The main work of the thesis is summarized as follows:(1)GPU implementation of wavelet transform method based on line block.Aiming at the problems of large amount of calculation and high time-consuming ratio of discrete wavelet transform,this thesis investigates the wavelet transform method based on row row and block,and realizes a new method based on row block combining the advantages of two methods,and uses shuffle instruction instead of shared memory to complete data communication between threads.In the end,the wavelet transform method in this thesis is more than 150 times faster than the CPU implementation.(2)Parallel design of Tier-1,dequantization and post-processing modules.The Tier-1module uses block-level parallelism,each GPU thread decodes a block of data independently.Modules such as inverse quantization and post-processing all use pixel-level parallelism,each GPU thread processes a single pixel.In the end,Tier-1 and post-processing modules are more than 8 times and 18 times faster than Open JPEG single-threaded implementation.(3)Efficient inter-module data transmission and CPU-GPU collaborative work design.In the JPEG2000 decoding process,there are serial transmission problems between Tier-2,Tier-1 and inverse quantization modules.In this thesis,an efficient Kernel function is designed to realize parallel data transmission between modules.Combining the multi-core CPU architecture and GPU streaming method,an efficient CPU-GPU collaborative working model is designed.The GPU decoder designed in this thesis is 10 times and 8 times faster than Open JPEG and Kakadu single-threaded decoding.The GPU decoder and the JPEG2000 compression chip developed by our research group work together to form a high-performance JPEG2000 codec system,which has been successfully applied to a satellite codec system. |