Macroblock Level Parallel Implementation And Its Scheduling Optimization Strategy For H.264 Decoders

Posted on:2017-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:J F Pan

Full Text:PDF

GTID:2348330503989858

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Video quality gets more sensitive to normal users, that’s why video definition becomes higher for the better user experience, but the high definition leads to the dramatically video decoding computatiuonal complexity increasing, and bringing in great challenges to real-time decoding system. The newly emerging heterogeneous multi-core processing platform composed by CPU and GPU has abundant computing resources. To make the most of the platform accelerating video decoding speed, improving the throughput and reducing latency is becoming a hot topic in both academy and industrial. A lot of work has been done in H.264 decoder parallelism, and the most promising and significance method is macroblock-level parallelism for the high scalability. Using GPU to mine the potential parallelism of macroblock and fully use the many-core can enhance the decoding efficiency, this makes great significance to meet the needs of high definition video decoding in time.A parallel method relying on GPU has been proposed after analyzing the data dependency and parallelism of macroblocks in a frame. Differ from the existing GPU-based pixel-level parallel methods with high data transmission overhead that neutralize the speedup of computing and reduce the overall efficiency, the proposed macroblock level parallelism technique combines the 2D-Wave theory and H.264 decoder structural feature to achieve parallel decoding which can improve the decoding efficiency, reuse the data on GPU and hide the transmission cost effectively. Considering the computation complexity of macroblocks are varied which bringing in thread synchronization overhead. An improved method which joins 2D-Wave and macroblock level computation complexity prediction is raised to reduce the synchronization overhead. This paper comes up with a macroblock level decoding complexity predicting model that suiting for the proposed parallelism method by referring to the existing frame-level decoding complexity prediction studies and analyzing the factors affecting macroblock decoding complexity of each decoding step. The mapping between data and thread is determined based on CUDA programming model and 2D-Wave.Experiments are evaluated on NVIDIA GPU. Result compared to the existing GPU-based pixel level parallelism show that the data transmission overhead has been effectively reduced for the full data reusing, but at the application level, no speedup is gained.

Keywords/Search Tags:

Macroblock Level Parallelism, CUDA Programming Model, Macroblock Level Computational Complexity, Parallell Optimization

PDF Full Text Request

Related items

1	Research Of Parallelism For Video Codec Algorithm
2	Parallel Algorithm Design And Optimization For H.264 Video Encoding
3	Study And Design Of Fast H.264 To H.265 Transcoding Based On Macroblock Transform
4	Research On An Adaptive Macroblock Encoding And Rate Control Algorithm Based On H.264/AVC
5	Performance Optimization Study For Stencil On Domestic Multi-Core Processor Platform
6	The Research And Implementation Of Key Techniques On Block Cipher ASIP
7	Video Forgery Detection Based On Macroblock Feature
8	VLSI Design Of Macroblock In The Chip Of H.264 Decode
9	Video Abnormal Events Detection Based On Low Level Feature
10	Genetic Algorithms For Solving Special Bi-level Programming Problem