| Coarse-Grained Reconfigurable Architecture(CGRA)is an efficient computing platform for cryptographic algorithms.Although the on-chip resources on CGRA are becoming more abundant,existing CGRAs have a low resource utilization.And they cannot pipeline execute the block cryptographic algorithms working under a linking mode,which are two problems affecting the performance improvement of cryptographic algorithm implementation.Therefore,this thesis takes high area efficiency(performance per unit area)as the target and researches the architecture design and key technologies of coarse-grained reconfigurable cryptographic processors.The main research contents and achievements of the thesis are as follows.1.This thesis analyzes the structural composition,computational bit width,and processing flow of different types of symmetric cryptographic algorithms under different working modes.The architecture characteristics of the symmetric cryptographic pipeline processing are extracted in terms of module communication and data dependency.Based on the pipeline processing technology,a Coarse-Grained reconfigurable processing model for symmetric cryptographic algorithms is established,and the feasibility of the processing model is verified by algorithms adaptation.The proposed model supports the construction of data path pipelines,laying a theoretical foundation for designing high area-efficiency Coarse-Grained reconfigurable cryptographic processors.2.Given the problem that the existing CGRA cannot balance the utilization rate of the computing unit and the critical path delay,this thesis proposes a Concurrent Pipelined Method(CPM).This method analyzes the data input,transmission,processing,and output processes of coarse-grained reconfigurable computing models,and studies the interconnection relationship of reconfigurable units,functional unit composition,functional unit idle and reuse technologies.A Coarse-Grained Reconfigurable Cryptographic-logic Array processing architecture is designed based on CPM,providing a new technical route for the reconfigurable cryptographic processor development.Experimental results show that the throughput of the algorithm mapping results of CGRCA compared to the processing architecture based on the regular pipelined method can be increased by up to 3.2 times,and the area efficiency can be increased by up to 2.76 times.3.When the existing CGRA interconnection structure is adapted to the CPM-based CGRCA,the interconnection network overhead will double,the concurrent pipeline cannot be built,and network congestion will occur.This thesis analyzes three data transmission modes and four dynamic data transmission scenarios when CGRCA processes cryptographic algorithms.An interconnection network based on the ring topology and virtual buses is proposed,which sets up multi-level interconnection sub-networks and dynamic interconnection units between each node,solving the problem of concurrent and dynamical data transmission of multiple pipelines on CGRCA.And it reduces the path redundancy and hardware overhead of the network.By introducing virtual buses,data bypass transmission and feedback transmission are realized to solve the problem of excessive global network overhead and interconnection competition.Compared with other typical interconnection structures,the network overhead reduces by approximately 41% at the cost of increased network latency by 15%.4.The memory system of the existing CGRA cannot meet the needs of concurrent read and write data of CPM,nor can it continuously transfer grouping of different data packets to the CGRA pipeline,so it can not pipeline execute the block cryptographic algorithms working in a linking mode.Therefore,this thesis designs a memory system that supports concurrent access to multiple data streams and interleaved pipeline processing of multiple packets for CGRCA.By setting up the distributed buffer arrays and the configurable interconnection network,this thesis solves the problem that the existing memory system does not support concurrent read and write of multiple data streams and designs a buffer interconnection network that combines static interconnection with dynamic interconnection to achieve interleaved pipeline processing of packets of many cryptographic tasks,breaking through the performance bottleneck of the existing architecture in utilizing pipeline technology to mine parallelism between groups in linking working mode.In CBC mode,the performance of block cipher algorithm is increased by11.36 times at most.5.This thesis implements CGRCA processing architecture and maps various symmetric cryptographic algorithms under a 40 nm CMOS process.The experimental results show that,compared with other CGRA platforms,the CGRCA throughput of the ECB mode block cipher algorithm achieves almost the highest level among similar architectures.The throughput of typical algorithms is about 74.4% of the best architecture.The throughput of the stream cipher algorithm and the hash function is better than that of other architectures;In terms of area efficiency,in addition to effectively improving the area efficiency of CBC mode block cipher,the CGRCA area efficiency of the stream cipher algorithm and the hash function is 2.75 times and 6.04 times higher than that of other CGRAs.In addition,compared to FPGA computing platforms under the same process,the area efficiency of CGRCA for cryptographic algorithms has increased by about 14.04 times,further proving that compared to fine-grained reconfigurable architectures,coarse-grained reconfigurable architectures have significant area efficiency advantages in the field of cryptographic processing. |