Font Size: a A A

Design And Implementation Of Data-Parallel Memory System For Tiled Stream Processor

Posted on:2010-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2178360302459740Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The processing computing speed of modern processor is promoted rapidly by the developing VLSI technology. Meanwhile, the memory access speed increases slowly, and the"Memory Wall"problem has occurred. The limited off-chip bandwidth of memory system has become the huge obstacle for boosting the performance of applications. This problem is stricter for Tiled Stream Processor, partially because memory access time of the data-parallel applications takes large percentage of the whole execution time, and partially because the design of traditional memory system can't capture the characteristic of such kind of applications. When the off-chip peak bandwidth is fixed, optimizing the memory system design to improve the efficiency of bandwidth and consequently to reduce the waiting time of computation is a key to improve stream processor performance.This dissertation focuses on the analysis, design and implementation of data-parallel memory system for Tiled Stream Processor. The major research contents and achievements cover the following aspects. Firstly, based on memory access model and architecture features of stream processor as well as the data-parallel applications'memory access characters, we have qualitatively analyzed the effects of the memory hierarchy with the overlapped computation and memory accesses on concealing the delay and improving the bandwidth. Secondly, on simulation platform, we have quantitatively studied and evaluated the influences of primary design parameters on memory performance when targeted to various accessing modes. The experiments show that the parameters which are sensitive to memory access mode should be configured according to the locality and parallelism of applications. Thirdly, based on the consideration of improving the utilizing efficiency of off-chip bandwidth, we have designed and implemented a data-parallel memory system for our Tiled Stream Processor in current project. This memory system can efficiently reduce the total times of off-chip memory access through hierarchical scheduling, which alleviates the needs of off-chip bandwidth. The results of software simulation and emulation verification indicate that for different characteristics of workloads our design can fully explore the row-locality and bank-parallelism of memory access by optimizing the configuration parameters to make the most efficient use of DRAM bandwidth, which eventually boosts the performance of whole Tiled Stream Processor system.
Keywords/Search Tags:Tiled Stream Processor Architecture, data-parallel memory system, off-chip memory access bandwidth, locality
PDF Full Text Request
Related items