| As the era of big data is coming, the amount of data is increasing explosively. How to orgnize these data and extract valuable information from them has become a big challenge. Deep learning model is a multi-layer structure which emulates the working style of cerebral cortext. Deep learnig model does well in big data mining, featrue extracting and object classification. As a consequence, Deep learning model has become one of the most popular models in the machine learning community.Deep learning algorithms often manifest as a kind of compute-intensive algorithms. Most deep learning applications are run on the high-powered clusters so far. Computation complexity has become one critical barrier which restricts the mass adoption of deep learning technology in big data applications. Thus research on acceleration of deep learning algorithms becomes a hot topic. Considering both performance and flexibility, FPGA(Field Programmable Gate Array) is one of the most appropriate platforms for acceleration. Research on acceleartion of deep learning algorithms with FPGA has just begun, which focused on acceleration of one special algorithm. In this paper, we aim to develop a uniform platform which could support acceleration for general deep learning algorithms.Firstly, we analyzed the execution flow and operation characteristics of some typical deep learning algorithms and summarized the execution templates. Based on the analysis, we proposed a Super-Vector co-Porcessor architecture for Deep Learning algorithms, named as SVP-DL for short. We designed an instruction set specially for SVP-DL, with which we can program and run object applications on SVP-DL. Next, we introduced how to map deep learning applications onto SVP-DL in data organization and algorithm programming. Finally, we implemented SVP-DL on Xilinx XC7VX485 T chip, and runed RBM algorithm on SVP-DL. The experiments result showed that SVP-DL architecture on FPGA can achieve twice the performance with a rather lower frequency compared to the PC platform.In future work, we will assemble SVP-DL on multiple FPGA chips thus we can fully exploit the parallelism of deep learning algorithms and get a better performance. Besides, we will develop some automatic tools which will make the acceleration process much easier. |