VLIW processors: Efficiently exploiting instruction level parallelism

Posted on:2001-04-12

Degree:Ph.D

Type:Dissertation

University:Stanford University

Candidate:Rudd, Kevin William

Full Text:PDF

GTID:1468390014455766

Subject:Electrical engineering

Abstract/Summary:

This dissertation explores high-performance complexity-efficient processors focusing on VLIW processors. Complexity efficiency is a qualitative characteristic that describes a system where performance has not reached the point of diminishing returns. Using the techniques described in this dissertation, simple statically-scheduled very-long-instruction-word (VLIW) processors can be efficient architectures for exploiting instruction-level parallelism and can effectively address the needs of general purpose computing.;We studied the ability of dynamic execution to exploit instruction-level parallelism in dynamic VLIW processors. Unlike previous studies, this study explores the benefits of dynamic execution on an instruction stream with explicit instruction-level parallelism. Dynamic execution is thus applied to problems that compilers have difficulty solving rather than to those problems that compilers readily solve reducing the need for complex and costly hardware. In addition to presenting performance results, we also describe a general processor model and execution definition that improves upon the precise execution model used in traditional processors; we also describe the simulator that implements this new execution model. In our simulations we varied a number of parameters allowing extraction of the individual effects of each parameter on performance. These simulation results show that although a small amount of reordering is adequate to eliminate almost all penalties associated with scheduling errors and latency variations, even a significant amount of reordering is inadequate to eliminate the penalty associated with branch mispredictions, and long memory latencies.;As an alternative to dynamic VLIW processors, we developed Replay Buffers to extend static VLIW processors to support efficient multi-threading. Replay Buffers provide zero switch-cycle thread switches as well as overhead-free exception handling (beyond the cost of the exception handler) and reasonable latency tolerance for delays. Replay Buffers allow VLIW processors to meet the needs of general-purpose applications without the complexity of dynamic VLIW. In addition to improving the capabilities and performance of VLIW processors, this technique has applications beyond VLIW processors and can also benefit all processors and systems using pipelines, particularly those using wave pipelining.

Keywords/Search Tags:

VLIW processors, Parallelism, Problems that compilers, Performance

Related items

1	Complementary compiler and architecture features for embedded VLIW processors
2	Branch optimizations and instruction-level parallelism exploitation for dynamic superscalar and VLIW processors
3	Investigation On Basic Block Scheduling Optimization For Predicate Execution VLIW DSP
4	Computational limits of VLIW architectures for digital signal processing transforms
5	Exploiting Parallelism in Multicore Processors through Dynamic Optimizations
6	Memory and control organizations of stream processors
7	Application Of Data Intensive Researching Methods On Instruction Scheduling Of Clustered VLIW Processors
8	Performance enhancing software loop transformations for embedded VLIW/EPIC processors
9	Exploration of parallelism for probabilistic graphical models
10	Non-speculative Parallelism Strategies For Irregular Applications On CMPs