Tiled Processor Architecture (TPA),as a many-core architecture design with good scalability, can cope well with challenges such as power consumption,wire delay, design and verification complexity in nano technology. In the Tiled Processor Architecture, L1 Data Cache has important effect on performance, inducing further research on latency reducing, communication and synchronization, ambiguity and scalability in memory system design. This thesis explores the design space of L1 Data Cache in Tiled Processor Architeture for Instuction level parallel (TPA-PI), proposing a design framework and optimizaztions through a quantitative analysis of the factors effecting performance.Research and results of this thesis includes: (1) After an ample investigation of L1 Data Cache designs of Tiled Processor Architecture proposed before,we demonstrate the design of L1 Data Cache for TPA-PI, which is an example of Tiled Processor Architecture. The L1 Data Cache is made up of four banks and partitioned based on interleaving address to support high bandwidth of memory-access and decrease synchronization. (2) Based on the analysis of data dependence characters of block execution model in TPA-PI, we propose the first optimization to L1 data Cache in TPA-PI, improving memory dependence predictor. Simulation results show that the optimization can boost the precision of memory dependence prediction for most applications. (3) Based on the analysis of memory access features of block execution model in TPA-PI, we propose the second optimization to L1 data Cache in TPA-PI, reducing memory access latency by making use of data prefetch mechanism. Simulation results show that the optimization can reduce latency of memory-access instructions remarkably.Experiment results on a part of SPEC CPU 2000 benchmarks show these two optimizations are suitable for most applications with very little on-chip resources overhead. |