Font Size: a A A

High performance termination detection techniques supporting multithreaded execution

Posted on:2001-02-26Degree:Ph.DType:Dissertation
University:University of Central FloridaCandidate:Tseng, YiliFull Text:PDF
GTID:1466390014955118Subject:Computer Science
Abstract/Summary:
Efficient detection of execution termination is essential for optimizing throughput of multithreaded parallel computer architectures. Points at which synchronization occur are referred to as synchronization barriers . The design objective is to minimize the amount of overhead required to enforce completion of each barrier prior to the resumption of subsequent processing.; This dissertation begins by developing a novel taxonomy for termination detection techniques based on thread allocation strategy and degree of processor reactivation support. A capability class hierarchy ranging from Static-Binding Idle-Tasking to Dynamic-Binding Any-Tasking is derived as a result of the taxonomy. Together they assist significantly in identification of properties which facilitate algorithm assessment and refinement. An optimality analysis indicates that as few as (E-N) additional messages can be utilized to realize dynamic binding rather than static binding of threads on N PEs with E reporting events.; The Tiered Detection Algorithm is shown to approach practical efficiency limits and is further refined in terms of its global invariant across non-serializable message communication channels. By attaching the level of thread nesting to thread consumption and production counts, it prevents false termination hazards. Its advantage in detection delay is revealed in average and worst cases over CV and LTD algorithms concerning the traversal of processor hierarchy and implementation performance of the Credit Algorithm.; The Tiered algorithm is then extended to a hardware-based approach. The Distributed-Sum Bit-Comparison logic (DSBC) developed is capable of supporting dynamic allocation of tasks for multithreaded execution on shared-memory, message-passing, and/or single-chip multiprocessors. For a system of N PEs, a single instance of global logic and N instances of local logic interconnected by 3N wires are shown to provide direct support to the compiler and programmer for any arbitrary number of barriers. DSBC detection time upon completion of the last task is shown to scale linearly in terms of the number of active barriers in the system. Comparison to Wired-NOR hardware approaches demonstrate reduced barrier detection time, decreased inter-PE wiring requirements, and increased functionality. Finally, a version is designed using Null Convention Logic to provide a delay-insensitive alternative implementation that eliminates race conditions and timing considerations in distributed environments.
Keywords/Search Tags:Detection, Termination, Multithreaded, Logic
Related items