| Coarse-Grained Reconfigurable Architectures(CGRAs)are promising parallel computing architectures with high energy efficiency of Application Specific Integrated Circuit(ASIC)and high flexibility of General-Purpose Processor(GPP).The computation-intensive portions of an application,such as loops,are usually executed on CGRAs for acceleration.Modulo scheduling is commonly used for loop mapping,and it improves loop execution performance by reducing the Initial Interval(II)between adjacent loop iterations.Although CGRA is energy efficient from the perspective of hardware,the actual gained performance highly depends on the compiling tools,and loop mapping optimization is crucial to the compiling result.In real programs,many loops are imperfectly-nested loops.For imperfectly-nested loops,the existing methods have some problems,such as limited scope of application,high additional overhead and ignoring the loop transformation of the input imperfectly-nested loop,resulting in low performance.Therefore,it is of great significance to improve the execution performance of imperfectly-nested loops on CGRA.To tackle the problems of existing methods,we propose a polyhedral based pipelining approach to improve the execution performance of imperfectly-nested loops on CGRA.Based on the polyhedral model,we construct the optimization problem of imperfectly-nested loop on CGRA into a mathematical model and find all the legal transformations of the original loop.In addition,we also design the performance model related to the specific hardware architecture,and take the Total Execution Time(TET)as the performance metrics to evaluate the theoretical performance of all these legal transformations before modulo scheduling.According to the theoretical performance evaluation results,the transformation with better theoretical performance is selected in turn for modulo scheduling on CGRA until the mapping is successful.The advantage of this is to avoid the huge overhead caused by the actual modulo scheduling of each transformation in all legal transformations.On the 4 × 4 mesh-connected CGRA,the experimental results show that our approach can reduce the TET of nested loop by 50.1% on average,as compared to the state-of-the-art techniques.Moreover,the compilation time is moderate in practice,which fully shows the effectiveness of our method.The results of this thesis improve the execution performance of CGRA for more general applications,and lay a foundation for CGRA developing towards general application scenarios. |