| With the developing of semiconductor process,the cost of chip designing is higher and higher.And the demand of “Application defines chip” is more and more urgent.Reconfigurable chips can take both computing efficiency and software programmability into account,satisfying the demand by supporting both “Application defines software” and “Software defines chip”.Being responsible for automatically mapping applications onto multiple reconfigurable processing units,the compiler backend plays an important role in reconfigurable chips.Increasing processing elements is one of the inevitable developing trends of reconfigurable chips,but it also brings new requirements and challenges to the compiler backend designing.This paper focuses on a large-scale coarse grained reconfigurable architecture with 1024 processing elements.Due to its new architecture features including heterogeneous memory access unit design,more limited PE interconnection,and multi-level pipeline design,the existing compiler backend design is no longer applicable.This paper designs and implements a new compiler backend.Based on the LLVM compiler framework,the new backend tool can automatically extract the loop of computing intensive applications and construct the data flow graph.After preprocessing,scheduling and mapping of the data flow graph,the configure package needed to execute on the target reconfigurable chips is generated.In this paper,this backend tool,together with other tools used in the compilation process,is integrated into a complete compiler of the target system.The compiler can simply and directly generate the executable files of the three-layer instruction set architecture,improving the usability of the compiler.In order to make full use of the abundant parallel resources of the target architecture,this paper explores and implements two backend optimization strategies based on the problem of high cost of instruction switching and synchronization.A new backend optimization strategy based on DFG splitting is proposed for small-scale data flow graphs.The strategy considers both memory aware optimization and data repartition,greatly improving the final application acceleration ratio.For large-scale data flow graphs,an instruction similarity optimization algorithm based on simulated annealing is proposed,which is combined with the structure of segmented instruction switching.It can greatly reduce instruction storing overhead.This paper implements the automatic compiling of typical computing intensive applications onto the target architecture.The compiled executable files can run correctly as an input for RTL simulation environment,and the application acceleration ratio is 23.2 times of general purpose processors.For the two optimization strategies,by comparing the simulation data of performance and instruction similarity,the performance improvement of 129% and the instruction similarity improvement of 56.97% and the instruction storing overhead decreasing of 72.32% are obtained respectively on average. |