| There are many kinds of computer vision tasks and diverse algorithms.Among these algorithms,traditional Image Filtering,Neural Network and Stereo Matching algorithms play fundamental roles.The operations of these algorithms need the support of hardware platform,traditional CPU and GPU processors may face the problems of limited performance and low energy efficiency when running these algorithms.Dedicated accelerators cannot support variety algorithms due to the lack of flexibility.In the diverse scenes of computer vision,this thesis studied a multi-core accelerator for the above three kinds of algorithms,and considered both high energy efficiency and flexibility.For the first goal of high energy efficiency,the customized vision algorithm accelerator core is designed according to the target algorithms,and the overall performance of the accelerator is improved through the homogeneous multi-core architecture design.Between multiple cores,a Multi-Core Ring Bus is designed to support a variety of low-delay data communication modes,it could realize sufficient data reuse and task parallelism.In addition,the cache resources within the accelerator cores can be shared through the bus,thus reduced the number of memory access with the external DRAM.In an accelerator core,a memory unit that supports Ping-Pong operation and a compute unit with a 2-D ring topology are deployed,which can perform high data throughput.Through these designs of memory and compute units,high MAC utilization and high energy efficiency are realized.For the second goal of flexibility,the idea of software and hardware co-design is adopted to improve the flexibility of multi-core accelerator.A special instruction set for traditional Image Filtering,Neural Network and Stereo Matching is designed.The instruction flow controls the hardware accelerator to complete data transmission,data calculation and other operations.The instructions not only flexibly configure communication modes of the Multi-Core Ring Bus,but also configure each accelerator’s core to perform different calculations of operators,and schedule multiple data streams in the core.Moreover,according to the characteristics of the hardware architecture,an instruction compiler is designed to generate the hardware accelerator’s instruction streams.The compiler performs optimization analysis and converses machine code for the above three kinds of target algorithm tasks.The simulation results shows that the proposed multi-core accelerator could accelerate traditional Image Filtering,Neural Network and Stereo Matching algorithms.This thesis also built a hardware dual-core accelerator platform based on FPGA to verify and test it’s performance.For AlexNet algorithm,Gaussian Filter algorithm and ”AD” Matching algorithm,the operational capability of the dual-core accelerator can reach: 40 GOPS,50 GOPS and 46 GOPS respectively,the energy efficiency can reach: 2.0 GOPS/W,2.4GOPS/W,2.2 GOPS/W respectively.Compared with CPU and GPU,the overall energy efficiency level of the proposed accelerator is greatly improved,and close to the level of dedicated AI accelerators. |