Font Size: a A A

Reconfigurable Neural Network System Based On Distributed Storage

Posted on:2022-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ShiFull Text:PDF
GTID:2568306326476454Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent decades,Convolutional Neural Network has been rapidly developed and widely used in image classification,pattern recognition,multi-target detection and other fields.With the expansion of applications,higher requirements are put forward for the running speed of Convolutional Neural Network.Hardware accelerator is the main technical means to improve the operation speed of convolutional neural network.Convolutional neural networks have different structures in different application scenarios.Therefore,reconfigurable hardware accelerators with different convolutional neural network structures have more market value than traditional non-reconfigurable hardware accelerators.The core strategy of convolutional neural network hardware acceleration is parallel operation.In this paper,multiple parallel operations are realized simultaneously by adding operation units within layers,between layers and between channels.A large number of parallelism strategies increase the data transfer requirements of accelerators,so external DDR bandwidth and on-chip storage requirements become the primary factors limiting the computing speed.In order to solve the problem of bandwidth and storage resources on a chip,a distributed storage architecture is proposed in this paper.The storage resources on a chip are partitioned,and the input cache,temporary cache and output cache are designed.The temporary cache is used to distribute the intermediate data of the convolution computation,so as to avoid the problem of insufficient DDR bandwidth caused by first writing DDR and then reading data from DDR.When the input and output caches are properly coordinated,the DDR interface can be not idle and DDR bandwidth resources can be utilized to the maximum extent.Secondly,block convolution for large input feature graphs is proposed to reduce the demand for on-chip cache resources.The reconfigurable hardware system realizes the reconfigurable convolutional neural network by controlling the operation process through the instruction program.From the perspective of reconstruction scale,although fine-grained instruction set can make the reconstruction process more flexible,it will affect the throughput rate of convolutional network.Therefore,a coarse-grained reconfigurable instruction set is designed in this paper to control the hardware reconfiguration of various convolutional neural networks.In accelerator design,the way of solidifying the size of the convolution kernel will affect the performance of the reconstructed network.In order to facilitate the expansion and transplantation of the accelerator,a parameterized method is adopted to solidify the convolution kernel to facilitate the transplantation of the system.In this paper,the hardware accelerator of general reconfigurable convolutional neural network is designed by using distributed storage architecture and reconfigurable convolutional neural network instruction set.Four convolutional neural networks(Lenet,AlexNet,VGG16 and ResNet)were reconstructed based on the general reconfigurable convolutional neural network(RNC)hardware accelerator.The final experimental results show that the partitioned convolution strategy adopted in this paper greatly reduces the use of on-chip storage resources.Distributed storage not only ensures parallelism,but also ensures network performance.
Keywords/Search Tags:Neural Network, Temporary Cache, Reconfigurable, Convolution Acceleration
PDF Full Text Request
Related items