Design And Verification Of DMA For Sparse Matrix Vector Multiplication

Posted on:2021-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Cao

Full Text:PDF

GTID:2518306050970209

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Sparse matrix vector multiplication(SpMV)is the core of calculation for solving sparse linear equations,and is widely used in scientific calculations and practical applications such as economic models,signal processing.In engineering applications,the number of repeated calls to the kernel of sparse matrix vector multiplication reaches tens of thousands.But SpMV has a very low ratio of floating-point calculation operations to storage access operations,and storage access is complex.Therefore,improving the computational performance of SpMV has become the key to improving engineering efficiency.At present,researches on methods for improving SpMV computing performance include sparse matrix data compression algorithms,sparse matrix data storage format prediction algorithms,heterogeneous high-performance hardware computing structures,and cache structure optimizations.However,in the calculation process of SpMV,there are problems such as cache data misses and cache capacity limitations,which make cache data misses a lot.The resulting memory access delay reduces the performance of SpMV.The M processor is a high-performance multi-core processor independently developed by the research group of the National University of Defense Technology,with multiple DSP cores.Each core has abundant computing resources,supports up to 50 parallel multiply-add calculations in a single cycle,and has a dual-vector Load / Store control unit to provide the data required for vector calculations.Its DMA unit has three host physical channels,which can achieve high-speed data transmission between the internal and external core storage units.Based on the project's requirements for the computing performance of the High Performace Conjugate Gradient and the processor unit structure,this paper proposes a data transfer method-SGDTM(Super Gather Data Transfer Mode)to enhance discrete indirect memory access efficiency,thereby improving the computational performance of algorithms with memory access bottlenecks.In the DMA component,referring to the design idea of the general host physical channel,a dedicated data channel APip(Application Pipe)was designed to implement the SGDTM data transmission method.The main work of this article are reflected in the following aspects:1.Based on the processor structure and resources in this subject,a transmission method that enhances the efficiency of discrete indirect memory access-SGDTM is proposed.The memory access principle of this transmission method is introduced in detail.2.In order to realize the SGDTM transmission method in the M processor,a dedicated host physical channel APip for SpMV calculation was added to the original DMA component design.The implementation process of the main structure of the channel is introduced in detail,including the state machine part,the read index part,the read data part,the abnormality detection part,and the end of transmission part.3.Module-level verification was performed on the DMA added with the APip channel.According to the design specifications,a detailed verification plan was determined,a Verilog hardware description language was used to build the verification platform,and the design scheme of each component was introduced in detail.Analyze coverage files,increase test incentives,and perform comprehensive verification of DMA.Excluding partially interpretable uncovered code,the coverage reached 100%.4.Under the 40 nm process condition of a certain manufacturer,the DC synthesis tool was used to synthesize the DMA module logically.The comprehensive results meet the project's needs for DMA timing,area,and power consumption.5.System-level evaluation of SpMV calculation performance.This paper introduces the design and implementation of SpMV algorithm structure and application,and the principle of performance testing.Analyze the bottlenecks in the processor that may affect the performance of the algorithm,and optimize them from the software program level and the hardware structure level.After optimization,the performance of SpMV has been significantly improved.Its double-precision floating-point operation reaches 14.62 GFLOPS,and the bandwidth utilization rate is 12.31%.

Keywords/Search Tags:

SpMV, compression algorithm, DMA, HPCG, DSP, Gather/Scatter

PDF Full Text Request

Related items

1	The Design And Implement Of Vector Memory To Support Gather/Scatter
2	Implementation And Optimization Of HPCG On Multi-core And Many-core Platform
3	Based On The Pci Bus Aerospace Monitoring Data Recording System Design And Implementation
4	Computing SpMV on FPGAs
5	Design Of High-speed SG DMA Controller For TTE Terminal
6	Parallel Design And Optimization Of SpMV On ARM Multi-core Platform
7	Research On And Implementation Of Algorithm Of Incoherent Scatter Radar Signal Processing
8	Research On Target Gather Prediction Algorithm Based On Information Fusion
9	Software Design And Implementation Of FC-AE-1553 Node Based On PON Topology
10	Design And Implementation Of DMA In High-speed Network Interface Card Based On CXL