Font Size: a A A

The Study And Implementation Of Matrix Operation Architecture Based On Bounded-Delay Asynchronous Circuit

Posted on:2024-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:W Q DiaoFull Text:PDF
GTID:2558307079992489Subject:Electronic Information·Computer Technology (Professional Degree)
Abstract/Summary:PDF Full Text Request
As the applications of the high-performance computing have become increasingly wide and deep,matrix operation as a foundation has become a crucial technology in many fields such as engineering and scientific calculation.However,since the matrix structure is flexible and the matrix data density varies greatly,the operation time is difficult to determine.Especially for the irregular operations of sparse matrices,traditional single-core processors are difficult to maintain spatial and temporal locality,resulting in problems such as high memory access latency and unbalanced payloads.While the multi-core processors can meet the speed requirements through parallel calculation,but the costs are complex hierarchical and localized interconnects,which often prevent them from fully leveraging their peak computing capabilities.At the same time,in some special scenarios,the multi-core processors are incapable of meeting the requirements of small area,low power,and other needs.Therefore,it is necessary to study a specific architecture for the matrix operation and its implementation by ASIC technology.Currently,the advanced matrix operation chips typically use synchronous clock mechanism,but as the technology approaches its limits and the scale of the circuit increases,the bottleneck of clock circuit becomes increasingly prominent.While the bounded-delay asynchronous circuit uses asynchronous controllers to replace the global clock and uses a local handshake mechanism to autonomously manage circuit functions,making it highly suitable for the irregular matrix operation.Additionally,as the timing mechanism of the circuit is absent,the asynchronous circuit also has several advantages such as low power and good resistance to electromagnetic interference.In this paper,we study the matrix operation architecture,which uses a bounded-delay asynchronous circuit and its design method,and implements a matrix operation asynchronous SoC(system on chip)with the UMC110 process.Firstly,this paper introduces a design method for the bounded-delay asynchronous circuit,and the asynchronous micro pipeline structure based on the"sender-relay-receiver"model.Secondly,a detailed analysis of matrix operations algorithms,as well as an adaptive strategy for the calculation of matrix data density was developed.A bidirectional compression was employed for the sparse matrices using the COO storage format.Subsequently,the micro-instructions of matrix operation were defined and the corresponding architecture was designed,including a distributed memory that supports the asynchronous SRAM management and a sparse matrix merger of the"one-to-one"strategy.Finally,an asynchronous SoC chip for matrix operation was implemented,which integrates a CPU core,an SPI interface,and a NoC bus for internal communication.The final chip area is 17.3mm~2,and the average power consumption is4.68m W.Experimental results show that the asynchronous SoC chip designed in this paper has good energy efficiency,the bounded-delay asynchronous circuit and its design method are feasible and efficient for implementing the matrix operation architecture.We regard this method can also provide a solution for the large-scale complex asynchronous integrated circuit design.
Keywords/Search Tags:Matrix Operation Architecture, Bounded-Delay Asynchronous Circuit, Low Power Design, Asynchronous SoC Chip
PDF Full Text Request
Related items