| Vision is the main way for people to gather information. As carrier of visual information, video with its large amount of data is hard to transmit and store. Therefore, developing efficient codec algorithm has been the focus of academic circles and industrial community. From the first generation video coding protocol H.261to H.264and recent high efficient video coding (HEVC), scholars and engineers constantly optimize various codec algorithms, anticipating achieving better compression with less computation. The superior compressive performance and network compatibility make H.264the most extensively used video coding protocol. However, this is achieved by high computation complexity. Therefore, how to leverage H.264encoding efficiency is the hot point for research.Nowadays, computers are heading towards multi-core and many-core direction. Multi-core and many-core based parallel computing and algorithm design is in the beginning stage. GPU previously used for graphic rendering is a kind of many-core processing unit and is applied to general parallel computing in recent years. In2007, Nvidia put forward compute unified device architecture (CUDA) and the assorted CUDA-C programming language which have been a big contribution for the development of GPU parallel computing.Mapping video encoding algorithm onto GPU has been an important direction for accelerating H.264encoding algorithm and how to design H.264parallel encoding algorithm suitable for GPU is the key issue. After reading H.264protocol and serial encoding algorithm and fully exploiting GPU many-core computing ability, this paper has finished parallel algorithm design and implementation for a number of H.264key module:a full parallel motion estimation algorithm for tree-structured motion estimation; a ladder-like parallel motion estimation algorithm for H.264key module P frame; a B frame prediction parallel algorithm for B frame in H.264; an algorithm for H.264parallel encoder key module rate control suitable for CABAC entropy encoding. All parallel algorithms designed are implemented through CUDA programming on GPU, and are experimented thoroughly. The experiment results show that the parallel algorithms can greatly accelerate H.264encoding while maintaining video reconstruction quality, and can realize1080p real-time encoding. |