| JPEG 2000 is a wavelet-based image compression standard. Due to its good performance at low bit compression, to achieve the progressive transmission, coding region of interest and good robustness, etc., it is widely used in remote sensing, aerospace, medical, military, meteorological and other major areas. Kakadu is one of the most efficient system of JPEG2000 algorithms currently. Relying on its unique three-tier architecture, it can simplify the image codec complexity to a great extent, and due to the object-oriented ways, it has a good reusability. But with the development of technology, especially in the aerospace and military fields, solutions of the compressed image having a higher speed requirements. CPU-based solutions costs high with low eficiency now, which is difficult to meet the practical demand.CPU cache and controller units consume most of the resources of the transistors, but the graphics processor GPU put more resources to the transistors of the ALU operation, therefore the computing ability has a great advantage over the CPU, and is more suitable for large-scale parallel process. In order to improve solutions of Kakadu compression system efficiency and meet the needs of practical application, this paper presents a GPU parallel optimization of the JPEG2000 decoding system based on Kakadu. The core part of Kakadu decoding system using high-performance parallel computing technology on GPU. This article describes the JPEG2000 image compression standard, GPU and CUDA programming development process, and then performs the GPU-based parallel optimization of JPEG2000 decoder, the main work1. High-performance parallel implementation of Tier2 part. Tier2 part is divided into three parts:the header parsing, tile header parsing and stream organization. In this paper, block-level parallel programs, using a GPU block process one image, and each GPU block using several threads in parallel ways.2. High-performance parallel implementation of Tierl part. Tierl module uses code block-level parallel decoding and each block are independent with each other. Using a GPU block decode an image and a thread decode a code block.3. High-performance parallel implementation of inverse wavelet transform. Parallel rows and rows between images using the internal serial operation between the image and the image. Inverse wavelet transform comprises four steps:the pre-scaling,the vertical filter, horizontal filters and the after scaling.Each part will be accelerated through the thread-level parallelism. The number of GPU blocks is set to be the same as the image rows. Use a GPU block handle a row to complete the parallel between rows, and use one thread to handle a pixel to complete the parallel between points.By the GPU based parallel optimization of JPEG2000 decoder, the image quality of the restored image after decompression is the same as before. In the case where the decoded image quality assurance, the decoding speed is 2-4 fold increase. GPU parallel optimization of the system greatly accelerate the overall speed of the decoding system, and improve the JPEG2000 image decompression algorithm throughput, and can meet the large amount of image data decoding needs. It is proved that the GPU based parallel optimization of JPEG2000 decoder is of high performance. |