| Zero-knowledge proofs(ZKP)provide effective solutions to privacy protection issues in the fields of digital signatures,blockchains,and distributed storage.Zero-knowledge succinct non-interactive arguments of knowledge(zk-SNARK)is widely used in many open-source projects because it can generate fixed-length proofs and quickly verify them.However,the ZKP protocol has great computing overhead and how to achieve high performance and low power consumption deployment in real scenarios has become a difficult problem restricting its application.In view of the existing problems of various acceleration schemes,this paper explores how to improve the speed and energy efficiency of the generation of proofs on the premise of maintaining good scalability and algorithm adaptation.Finally,this paper uses high-level synthesis to design a hardware architecture for number-theoretic transformation and elliptic curve operations based on FPGA and implements a ZKP heterogeneous computing system.Firstly,this paper proposes a highly pipelined architecture for large-scale and large-bitwidth number theory transformation in zk-SNARK.Large-bitwidth modular arithmetic is optimized and low-latency modular addition,subtraction,and multiplication,hardware units are designed.And then,the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning,which reduces hardware resource overhead.By designing a butterfly operation unit with low latency and combining with data reordering strategy,a two-level pipeline among sub-tasks and among butterfly operations with different step sizes is implemented.This architecture can be flexibly scaled to different scales of FPGAs.The evaluation results show that the NTT module performs 1.95 ×faster than the one in PipeZK.Secondly,a software and hardware co-computation scheme is designed which can be used to handle multi-scalar multiplication in zk-SNARK.The modular arithmetic units are pipelined and the hardware unit of point addition is realized.Then,the pipeline was implemented by designing data flow among each stage of point addition.Furthermore,the point summation task is transformed into a batch point addition sequence based on addition tree,and a computational architecture for point summation is proposed based on the pipeline architecture of point addition.This architecture can accelerate various computing tasks such as elliptic curve point summation and point multiplication,and can also cooperate with software to realize software and hardware collaborative acceleration for multi-scalar multiplication.Finally,a heterogeneous computing system for zero-knowledge proofs is implemented.This paper uses the Xilinx Vitis development tool kit to implement the proposed computing architecture for number-theoretic transformation and elliptic curve operations based on HLS.Then,under the OpenCL heterogeneous programming framework,the architectures are integrated into the well-known ZKP open-source project,Bellman,and a complete heterogeneous computing system is realized based on AMD Xilinx Alveo U50.The evaluation results show that the NTT module achieves 27.98× and 1.74× speedup,6.9× and 6×energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively.And the point summation module achieved 41.5×and 3×speedup,up to 12.42×energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively. |