Font Size: a A A

Research On Loop Optimization In Compiling System Of Coarse-Grained Reconfigurable Architectures

Posted on:2016-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J LiuFull Text:PDF
GTID:1318330536450184Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The Coarse-Grained Reconfigurable Architecture(CGRA) combines the flexibility of general purpose processor(GPP) and the high performance of application specific integrated circuit(ASIC). CGRA can be further categorized in respect to their interconnection scheme in row-based CGRA and array-based CGRA. Computation-intensive applications are often mapped onto CGRA for acceleration. In these applications, loops usually occupy most of the execution time. Therefore, optimizing loop mapping onto CGRA is of great importance. Loop mapping onto CGRA is a very challenge work due to two reasons. First, the hardware architecture of CGRA is very special, which is quite different from GPP or FPGA. Second, loops, especially nested loop, have complicated data dependence. Targeting for the two typical CGRAs, row-based CGRA and array-based CGRA, this dissertation proposed three approaches to improve the execution performance of loops based on polyhedral model, which fall into two categories, spatial mapping and temporal mapping.The first approach is polyhedral model based spatial mapping for row-based CGRA.By analyzing the performance influencing factors of row-based CGRA, we first establish a performance model of loop mapping onto row-based CGRA, which can reflect not only the loop transformation coefficients and hardware constraints. A novel searching strategy is also designed to find the optimal result efficiently. Finally, we built a complete flow of mapping loop nests onto CGRA. Experiment results on most kernels of the Polybench show that our proposed approach can improve the performance of the kernels by 42% on average, as compared with the state-of-the-art methods.The second approach and the third approach are both of temporal mapping to improve loop software pipelining performance on array-based CGRA. In the second approach joint iteration-wise affine transformation and software pipeline merging are proposed to improve PE utilization rate and reduce memory accessing overhead. In the third approach, operator-wise affine transformation is proposed to exploit parallelism not only from loop level but also from operator level. Meanwhile, the length of all the dependence in loop can be expressed by this operator-wise affine transformation, which is very help to optimize the transmit manners of data dependence. Experimental result shows that these two approaches can improve the performance of the loop kernels by 71% and 96% on average, respectively, as compared with the state-of-the-art methods.Through the three approaches mentioned above, we can develop a unified and complete loop mapping method to improve loop execution performance on CGRA.
Keywords/Search Tags:CGRA, Loop Optimization, Polyhedral Model, Software Pipelining
PDF Full Text Request
Related items