| Motion estimator is one of the key components of high compression video codecs which consumes the most computation. Its performance decides the quality of video compression, the processing ability, power consumption and hardware cost of the system chip. Motion estimation is composed of integer motion estimation and fractional motion estimation. In both natural and synthetic video sequences, the true frame-to-frame displacement of moving objects is rarely an integer number of pixels, so using sub-pixel motion vector can improve the accuracy of motion estimation. The newest video coding standard H.264 adopts 1/8-pixel accuracy motion estimation. It not only improves the coding quality and compression efficiency, but also brings huge computation. Especially nowadays many efficient VLSI architectures have been designed for integer motion estimation, the timing consuming problem of fractional motion estimation has bebome a major issue. So in order to achieve real-time processing, fractional motion estimation should be accelerated by hardware implementation-to design high parallel VLSI architecture.Based on the study of fractional motion estimation in H.264 reference software-JM11.0, this thesis reduced the searching procedure for the best matching fractional pixel to 7 loops and abstracted the key modules in it, then modified it to more parallel one. According to VLSI design methodology, this thesis proposed a novel fractional motion estimation VLSI architecture. This architecture is based on full-search block match algorithm, and adopted sub-macroblock decomposing, vertical integration techniques, using a 7x7 systolic array to complete half and 1/4-pixel parallel searching. This architecture saved a big amount of data storage and transmission, thus saved memory resources, and simplified data flow and control flow, which results in designing a more efficient VLSI architecture. Based on HJTC 0.18μm technology, using Synopsys DC to synthesize the proposed architecture, the maximum working frequency is 147MHz, gates number is 276k, can process 109k macroblocks per second, and can meet the real time processing requirement of SDTV(1280x720) @30Hz . |