| With the rapid development of deep learning,the network structure has become deeper and wider,and the amount of computing has begun to explode.The traditional general-purpose processor(CPU)uses the Von Neumann architecture.Although flexible,memory access becomes its bottleneck.The GPU uses a large number of arithmetic logic units(ALU).Although the computing efficiency is improved,the corresponding power consumption increases in proportion to the throughput.In the embedded field,limited power consumption,bandwidth,and on-chip resources are a big challenge.Therefore,building a processor dedicated to convolutional neural networks(CNN)through hardware has become a hot research direction.ASIC achieves high throughput and low power consumption through a high degree of customization,but the development cycle is longer and the cost of custom chips is higher.FPGA not only has the advantages of throughput and power consumption,but also has a short development cycle and reconfigurability.It is widely used in the design of convolutional neural network accelerators.We proposes a convolutional neural network acceleration architecture for limited hardware resources.While retaining the acceleration effect,it minimizes the bandwidth and on-chip resource requirements.Its main innovations are as follows:1.Mixed-pipeline design,which uses a pipeline structure in terms of macro and micro,and makes full use of the parallelism in time.2.Convolution size pipeline calculation architecture.The convolution calculation unit supports common convolution kernel sizes,and internally uses a 6-stage pipeline to compress the time to accumulate intermediate result,which can produce one result per clock.3.Highly multiplexed on-chip cache architecture.Each feature map pixel only needs to be loaded once,so the data can be highly multiplexed.There is no need to store complete feature maps on-chip,so on-chip resource requirements are low.4.Flexible and efficient software and hardware system architecture.Based on Zynq soc,the common parameters of the convolutional neural network can be flexibly configured through software,and then different network structures can be easily switched.We use PYNQ as an experimental platform to implement this architecture which integrate a Xilinx Zynq-7020 system-on-chip.The processor system(PS)and programmable logic(PL)are connected through the AXI bus to facilitate software and hardware collaboration.We have tested the architecture using the MNIST and Cifar-10 datasets,showing good bandwidth and on-chip resource requirements,low power consumption,and flexible software switching between the two networks. |