Design,Optimization And Verification Of The Floating-point MAC Unit For The 32 Bit High Performance M-DSP

Posted on:2017-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:W B Che

Full Text:PDF

GTID:2428330569498513

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Digital Signal Processors are widely used in medical,communications and military,etc.Due to the special DSP instruction they provide can be used to quickly implement digital signal processing in these fields.M-DSP is a 32-bit high-performance DSP researched independently by NUDT,and is used in the fields of high-performance computing,wireless communication,video and graphic processing and so on.Its frequency is up to 1GHz,and it adopted the structure of VLIW(Very Long Instruction word)with 11 issues.The thesis completed design,verification,optimization of Floating-point fused Multiply ACcumulator unit(FMAC)based on the design and research of M-DSP.The main work is as following:1.A FMAC with six levels pipeline execution was designed and implemented.The thesis took low-latency as target to design a 32-bit FMAC.According to requirement of float point instruction set,these operations of the FMAC unit implemented include single and double precision floating-point multiplication,single and double precision floating-point multiply-add and multiply-sub,floating-point complex multiplication,dot product with dual float point path.Due to the limit of register file read port and width of data path,double precision instructions need cost extra one cycle for both reading and writing,and complex multiplication and dot product also need cost extra one cycle for reading.2.Every instruction of the FMAC was analyzed and optimized.Considering the balance of area and latency,the optimization strategy with sharing hardware resource was adopted.The multiplier whose width is 54�32 was designed and implemented,it was composed of four 27�16 sub-multipliers with same architecture.Then,the modules for judging exception,shifting of aligning and normalization were optimized.3.The FMAC has been fully verified by using NC of Cadence Company.The unit was verified from module level to system level hierarchically.Firstly,the C gold model of FMAC was written,and took the results from the model as standard when comparing with FMAC.Secondly,module verification,random verification,precision verification,pipeline execution verification,global signals verification and instructions combined verification were done,and then coverage rate was analyzed.Finally,formal verifiction was doned.The verification results indicate the function of the designed instructions is right,and the processing for boundary values is corresponding to IEEE-754 standard.4.The FMAC was synthesized and optimized by using Design Compiler with40nm technique of Synopsys Company.According to the design target of M-DSP architecture,the FMAC unit was synthesized by Design Compiler with 40nm technology library with typical case.According to the experiment results,the critical path was optimized.The synthesis results showed that the critical path is 450ps,the frequency is up to 1GHz,the power consumption is 6.7570mW,the cell area is35250um~2,and the whole performance is higher than traditional architecture,and can meet the performance requirement of M-DSP.

Keywords/Search Tags:

Digital Signal Processor, Floating-point Multiply Accumulate, multiplier, verification, optimization

PDF Full Text Request

Related items

1	The Design, Optimization And Verification Of Fixed-point Multiply Accumulate For X-DSP
2	The Design And Implementation Of High-performance64Bit Fixed-point SIMD Multiply Accumulate For FT-XDSP
3	Design Optimization And Verification Of Floating Point Units Based On BOOM
4	The Design,Verification And Optimization Of Multiplier Unit Based-on X-DSP
5	The Research And Implement Of The High Performance Floating-Point Multiply, Add Unit
6	The Research And Implementation Of High Performance SIMD Floating-point Multiplication Accumulator Unit For FT-XDSP
7	The Design Of Floating-Point Multiply-Add Fused Units In General Purpose Processors
8	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
9	The Architecture And Implementation Of Arithmetic Clusters Based On Stream Applications
10	The Design And Implement Of Floating-point Fused-multiply-add Unit For High-performance Microprocessor