| As a source of performance for modern computing systems,coprocessors provide increasingly valuable computing power.In most cases,a coprocessor appears in a standalone form and receive computational tasks from the main processor.Common coprocessors include GPU(Graphic Processing Unit),ASIC(Application Specific Integrated Circuit)and FPGA(Field Programmable Gate Array),but they all have various defects in terms of performance,power consumption,hardware scale,and flexibility.CGRA(Coarse-Grained Reconfigurable Array),which has both high energy efficiency and high flexibility,has gradually attracted attention.It can adjust the hardware structure for input applications.In addition,the instruction set of the main processor CPU(Central Processing Unit)connected to the coprocessor is also an important factor affecting the performance of the entire system.The x86 and ARM instruction sets that are widely used at present have the problems of complex instructions,difficult expansion,and expensive licensing fees.To address the above problems,this paper adopts the "CPU + CGRA" architecture to design a system in the form of "main processor + coprocessor".The main research work of this paper focuses on the following three aspects:(1)The main processor CPU of the system is the control center of the system,and adopts a RISCY(also called ri5cy)processor that supports the RISC-Ⅴ instruction set.This is a high-efficiency processor with a Harvard architecture and a four-stage pipeline.In this paper,custom extended instructions are made to the RISC-Ⅴ instruction set,which optimize the control logic of the main processor to the coprocessor and improve the execution efficiency.In this paper,through the design of both software and hardware,the custom extended instructions can be executed correctly.The extended instruction format designed in this paper can support up to 128 custom extended instructions.(2)The coprocessor of the system is the computing center of the system,adopts the CGRA architecture and is specialized in processing input applications.To address the problem that the current design of CGRA has high complexity and needs manual adjustment,this paper proposes an automatic design method of CGRA.The method firstly analyzes the input application,outputs optimization suggestions for the design of hardware circuits,and then automatically generates corresponding hardware circuits according to the optimization suggestions,thereby realizing the automatic design of the coprocessor.To save system resources and reduce area overhead,the coprocessor in this paper is integrated on the main processor in the form of inline.(3)This paper builds a software-hardware co-simulation environment,which can simulate the system and obtain the execution time of the input application.This paper also realizes the automatic calculation and automatic extraction of the performance parameters of the coprocessor after the coprocessor automatic design is completed.The experimental data shows that,under the premise of using TSMC’s 65 nm process,compared with the standalone RISCY processor,the system designed in this paper achieves speed improvements of 1.29 times,1.39 times,and 1.46 times respectively,with area increases of 3%,5%,and 7% respectively.It can be seen that the CGRA coprocessor automatically designed in this paper has achieved the expected research goal. |