Font Size: a A A

Design And Verification Of DMA For Sparse Matrix Vector Multiplication

Posted on:2021-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y S CaoFull Text:PDF
GTID:2518306050970209Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Sparse matrix vector multiplication(SpMV)is the core of calculation for solving sparse linear equations,and is widely used in scientific calculations and practical applications such as economic models,signal processing.In engineering applications,the number of repeated calls to the kernel of sparse matrix vector multiplication reaches tens of thousands.But SpMV has a very low ratio of floating-point calculation operations to storage access operations,and storage access is complex.Therefore,improving the computational performance of SpMV has become the key to improving engineering efficiency.At present,researches on methods for improving SpMV computing performance include sparse matrix data compression algorithms,sparse matrix data storage format prediction algorithms,heterogeneous high-performance hardware computing structures,and cache structure optimizations.However,in the calculation process of SpMV,there are problems such as cache data misses and cache capacity limitations,which make cache data misses a lot.The resulting memory access delay reduces the performance of SpMV.The M processor is a high-performance multi-core processor independently developed by the research group of the National University of Defense Technology,with multiple DSP cores.Each core has abundant computing resources,supports up to 50 parallel multiply-add calculations in a single cycle,and has a dual-vector Load / Store control unit to provide the data required for vector calculations.Its DMA unit has three host physical channels,which can achieve high-speed data transmission between the internal and external core storage units.Based on the project's requirements for the computing performance of the High Performace Conjugate Gradient and the processor unit structure,this paper proposes a data transfer method-SGDTM(Super Gather Data Transfer Mode)to enhance discrete indirect memory access efficiency,thereby improving the computational performance of algorithms with memory access bottlenecks.In the DMA component,referring to the design idea of the general host physical channel,a dedicated data channel APip(Application Pipe)was designed to implement the SGDTM data transmission method.The main work of this article are reflected in the following aspects:1.Based on the processor structure and resources in this subject,a transmission method that enhances the efficiency of discrete indirect memory access-SGDTM is proposed.The memory access principle of this transmission method is introduced in detail.2.In order to realize the SGDTM transmission method in the M processor,a dedicated host physical channel APip for SpMV calculation was added to the original DMA component design.The implementation process of the main structure of the channel is introduced in detail,including the state machine part,the read index part,the read data part,the abnormality detection part,and the end of transmission part.3.Module-level verification was performed on the DMA added with the APip channel.According to the design specifications,a detailed verification plan was determined,a Verilog hardware description language was used to build the verification platform,and the design scheme of each component was introduced in detail.Analyze coverage files,increase test incentives,and perform comprehensive verification of DMA.Excluding partially interpretable uncovered code,the coverage reached 100%.4.Under the 40 nm process condition of a certain manufacturer,the DC synthesis tool was used to synthesize the DMA module logically.The comprehensive results meet the project's needs for DMA timing,area,and power consumption.5.System-level evaluation of SpMV calculation performance.This paper introduces the design and implementation of SpMV algorithm structure and application,and the principle of performance testing.Analyze the bottlenecks in the processor that may affect the performance of the algorithm,and optimize them from the software program level and the hardware structure level.After optimization,the performance of SpMV has been significantly improved.Its double-precision floating-point operation reaches 14.62 GFLOPS,and the bandwidth utilization rate is 12.31%.
Keywords/Search Tags:SpMV, compression algorithm, DMA, HPCG, DSP, Gather/Scatter
PDF Full Text Request
Related items