| With the development of quantum computing technology,traditional public-key cryptosystems will face security threats from quantum computer attacks.The transition from existing encryption technology to post-quantum cryptographic schemes with quantum security is a popular research content.Among the existing post-quantum cryptographic schemes,the cryptographic scheme based on lattice problem has become one of the most potential PQC schemes due to its simple structure,fast speed,and high seurity.The Saber algorithm studied in this paper is one of the most popular lattice cipher schemes due to its small public key size,low transmission bandwidth,and easy implementation.In view of the high research value of Saber,this paper studies and designs a Saber security coprocessor with high flexibility and high performance to execute the Saber algorithm quickly and efficiently.First of all,this paper conducts detailed analysis and research on the execution steps of Saber,determines the design scheme based on the custom instruction set coprocessor,and designs an efficient function module interface,which simplifies the system structure and improves the efficiency of instruction execution.Secondly,a SHA3 module with simplified functions was designed to reduce resource consumption.Then,in view of the inability to use number theory transformation to accelerate polynomial multiplication in Saber,a highperformance multiplier based on the Karatsuba was proposed.The multiplier combines parallelization and pipelining to optimize the multiplication operation and data loading process,and can complete the 256-degree polynomial multiplication within 128 cycles.Finally,this paper designs a set of instructions to control the coprocessor to complete all the steps defined in Saber to ensure the integrity of the coprocessor function.The RTL-level function simulation results of the RTL project show that the coprocessor requires 13,233 clock cycles to complete a key exchange.Among them,3267 cycles of key generation,4656 cycles of key encapsulation,and 5310 cycles of key decapsulation.The timing analysis report given by Vivado 2018.3 shows that the maximum operating frequency of the coprocessor can reach 345 MHz.The resource report shows that the coprocessor occupies 25346 LUTs,12613 FFs,384 DSPs and 2 36Kbit-BRAMs.Compared with the latest international references,the coprocessor proposed in this paper has advantages in scalability,resource utilization and system performances.It is suitable for applications with high performance requirements,and has applications and research value. |