| AVS2 is the latest Audio Video standard of China with independent intellectual property rights inherited from AVS1 and AVS+.Compared with H.264/AVC,AVS2 coding efficiency more than double with the same visual quality by introducing a flexible quad-tree partition structure and multiple novel coding techniques.The coding efficiency of AVS2 is equivalent to HEVC which is a counterpart of AVS2 and up to 4 times that of HEVC under the scene coding mode.But these improvements lead to considerable increase in coding complexity and it is difficult to achieve the real-time coding if AVS2 is applied to the Full High Definition(FHD)video and Ultra High Definition(UHD)video.As the development of parallel technology,Graphics Processing Unit(GPU)shows its great advantages in parallel processing of big data and memory bandwidth access.The Compute Unified Device Architecture(CUDA)developed by NVIDIA makes general application development more convenient.In this thesis,the very time-consuming inter motion prediction is optimized based on the CUDA and AVS2 standard.The optimized modules include the pre-search of ME,the integer pixel search and the fractional pixel search of ME.The detailed works are as follows:1)In the pre-search of ME,fast search algorithm is applied in the Largest Coding Unit(LCU)for searching the rough best Motion Vector(MV).In GPU,threads calculate the Sum of Absolute Difference(SAD)of 4x4 blocks and get the rough best MV by reduction technology.2)In the process of integer pixel search,the large data structure of CU is replaced by the mapping table and the quasi integral graph algorithm is adopted.The cost of any type of PU can be calculated by merging SADs of 4x4 blocks.The data access speed can be improved by several optimization methods such as storing image data of the current frame and the reference frame by fully utilizing shared memory and texture memory,the proper utilization for shared memory,the definition of local variables,the optimization of instructions and the intrinsic function call of CUDA,etc.3)In the process of fractional pixel search,the design of parallelization fully utilizes the hierarchical partition way.By recycling each depth of CU,the fractional pixel search ME can be effectively accelerated.The experimental results show the optimized video encode scheme of AVS2 in GPU has better performance in coding time compared with that in CPU.The scheme proposed in this thesis has huge research significance and practical value. |