| With the rapid development of video technology,video resolution becomes more and more high.At present,high-definition and ultra-high-definition video have become the mainstream,which brings great challenges for ultra-high-definition video storage and transmission The latest video coding standard HEVC/H.265(High Efficiency Video Coding),provides a good compression efficiency for high-definition and ultra-high-definition video.Compared with the previous generation video coding standard H.264,HEVC can reduced the coding bit rate by nearly 50%in the case of the same video clarity.While improving the compression rate,HEVC coding complexity and coding time are correspondingly increased,the video codec real-time is very unfavorable.Therefore,in order to achieve real-time process for ultra-high-definition video,we need to design high throughput and high performance HEVC codec chip.In this paper,we focus on the inter prediction of HEVC encoder,and propose a hardware architecture of high-throughput integer motion estimation and fraction motion estimation.(1)Motion estimation is the core module of HEVC inter prediction,in order to improve the compression efficiency of video images,the size and number of prediction units(PU)are increasing dramatically,resulting in high complexity of motion estimation,which brings great challenges to real-time processing of HD and ultra-high definition video.In this paper,a motion estimation algorithm suitable for hardware is proposed for the integer motion estimation,and a hardware architecture is designed.The algorithm is divided into two stages:rough search and fine search.In the fine search stage,the prediction unit of the same depth shares its rough search result,which increases the parallelism of PU.For hardware design,a scheduling strategy by hierarchical reusing the reference pixel is designed in the rough search stage,and the pipeline structure is organized to ensure the complete reuse of the reference pixels and achieve the matching cost by calculated as a pipeline form;in the refine search stage,raster scanning search strategy is used,and we reuse the reference pixel register and SAD calculation unit in the rough search,which greatly reduces the hardware resources.Synthesized results in the 90nm show that the frequency of presented architecture can reach 377MHz and the throughput can achieve 3840 × 2160@60fps real-time processing in the search range of ± 64,which meets the requirement of processing HD video images in real time.(2)In this paper,the fraction motion module is also designed for the inter-frame prediction.The part of interpolation filter unit is shared by the half-pixel filter and 1/4 pixel filter,and the interpolation result is shared between different interpolation positions,the number of interpolation is reduced.By analyzing the processing order of the data of the search point,the interpolation and the matching cost calculation unit is processed by the pipeline structure in the different search stage.And the circuit structure of the interpolation filter unit is also optimized.Finally,it can reach 3840 ×2160 @30fps processing speed. |