Font Size: a A A

Symbiotic subordinate threading (SST)

Posted on:2008-07-25Degree:Ph.DType:Dissertation
University:University of Maryland, College ParkCandidate:Mameesh, RaniaFull Text:PDF
GTID:1449390005453265Subject:Engineering
Abstract/Summary:
Integration of multiple processor cores on a single die, relatively constant die sizes, increasing memory latencies, and emerging new applications create new challenges and opportunities for processor architects. How to build a multi-core processor that provides high single-thread performance while enabling high throughput through multi-programming? Conventional approaches for high single-thread performance use a large instruction window for memory latency tolerance, which requires large and complex cores. However, to be able to integrate more cores on the same die for high throughput, cores must be simpler and smaller.; We present an architecture that obtains high performance for single-threaded applications in a multi-core environment, while using simpler cores to meet the high throughput requirement. Our scheme, called Symbiotic Subordinate Threading (SST), achieves the benefits of a large instruction window by utilizing otherwise idle cores to run dynamically constructed subordinate threads (a.k.a. helper threads) for the individual threads running on the active cores.; In our proposed execution paradigm, the subordinate thread fetches and pre-processes instruction streams and retires processed instructions into a buffer for the main thread to consume. The subordinate thread executes a smaller version of the program executed by the main thread. As a result, it runs far ahead to warm up the data caches and fix branch miss-predictions for the main thread. In-flight instructions are present in the subordinate thread, the buffer, and the main thread, forming a very large effective instruction window for single-thread out-of-order execution. Moreover, using a simple technique of identifying the subordinate thread non-speculative results, the main thread can integrate the subordinate thread's non-speculative results directly into its state without having to execute their corresponding instructions. In this way, the main thread is sped up because it also executes a smaller version of the program, and the total number of instructions executed is minimized, thereby achieving an efficient utilization of the hardware resources. The proposed SST architecture does not require large register files, issue queues, load/store queues, or reorder buffers. In addition, it incurs only minor hardware additions/changes. Experimental results show remarkable latency-hiding capabilities of the proposed SST architecture, outperforming existing architectures that share similar high-level microarchitecture.; We performed two extensions of our SST scheme, and came up with two additional microarchitectures. In the first extension, we developed a simple way to allow the subordinate thread be aware of its own speculation. A speculative-aware subordinate thread is capable of identifying instructions that are more likely to produce invalid values, and so may skip their execution. In the second extension, we allow a subordinate thread to have its own subordinate thread. The main thread and multiple subordinate threads are arranged in a hierarchy based on the degree of their speculation, with the most speculative subordinate thread at the bottom of the hierarchy and the least speculative thread (the main thread) at the top of the hierarchy. This new microarchitecture, named Hierarchical Symbiotic Subordinate Threading, combines the benefit of the speed of highly speculative subordinate threads with the accuracy of not-too-speculative subordinate threads.
Keywords/Search Tags:Subordinate thread, SST, Cores
Related items