Font Size: a A A

Macroblock Level Parallel Implementation And Its Scheduling Optimization Strategy For H.264 Decoders

Posted on:2017-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:J F PanFull Text:PDF
GTID:2348330503989858Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Video quality gets more sensitive to normal users, that’s why video definition becomes higher for the better user experience, but the high definition leads to the dramatically video decoding computatiuonal complexity increasing, and bringing in great challenges to real-time decoding system. The newly emerging heterogeneous multi-core processing platform composed by CPU and GPU has abundant computing resources. To make the most of the platform accelerating video decoding speed, improving the throughput and reducing latency is becoming a hot topic in both academy and industrial. A lot of work has been done in H.264 decoder parallelism, and the most promising and significance method is macroblock-level parallelism for the high scalability. Using GPU to mine the potential parallelism of macroblock and fully use the many-core can enhance the decoding efficiency, this makes great significance to meet the needs of high definition video decoding in time.A parallel method relying on GPU has been proposed after analyzing the data dependency and parallelism of macroblocks in a frame. Differ from the existing GPU-based pixel-level parallel methods with high data transmission overhead that neutralize the speedup of computing and reduce the overall efficiency, the proposed macroblock level parallelism technique combines the 2D-Wave theory and H.264 decoder structural feature to achieve parallel decoding which can improve the decoding efficiency, reuse the data on GPU and hide the transmission cost effectively. Considering the computation complexity of macroblocks are varied which bringing in thread synchronization overhead. An improved method which joins 2D-Wave and macroblock level computation complexity prediction is raised to reduce the synchronization overhead. This paper comes up with a macroblock level decoding complexity predicting model that suiting for the proposed parallelism method by referring to the existing frame-level decoding complexity prediction studies and analyzing the factors affecting macroblock decoding complexity of each decoding step. The mapping between data and thread is determined based on CUDA programming model and 2D-Wave.Experiments are evaluated on NVIDIA GPU. Result compared to the existing GPU-based pixel level parallelism show that the data transmission overhead has been effectively reduced for the full data reusing, but at the application level, no speedup is gained.
Keywords/Search Tags:Macroblock Level Parallelism, CUDA Programming Model, Macroblock Level Computational Complexity, Parallell Optimization
PDF Full Text Request
Related items