Font Size: a A A

The Design And Implementation Of Multiple-precision Floating-point Multiply-Add Fused Unit

Posted on:2017-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y G LiuFull Text:PDF
GTID:2348330488474194Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years, due to the scientific computing and multimedia technology is widely used, the demand for floating-point calculations performance is also increasing. As a part of floating-point processing unit, floating-point multiply-add fused unit is one of the keys of design.It is our goal to design a high-performance floating-point multiply-add fused unit.Floating-point multiply-add fused unit merges the two steps of floating point multiplication and addition computation into one step.Based on the traditional floating-point multiply-add fused unit, a new architecture for multi-mode floating point multiply-add fused unit is designed. In the IEEE-754 floating-point standard, by using Single Instruction Multiple Data technology this design can support one double-precision floating-point multiply-add operation or two parallel single-precision floating-point multiply-add operations.This design used fixed-point addition and multiplication theory.The inverse stage of C is moved to the front of alignment shifter and index process, alignment shifter, mantissa multiplication and Leading Zero Anticipation is made in detail.This design has been improved following two aspects.First,by modifying the control signal this design can support for the two single-precision.Alignment shifter can support 161-bit double precision shift or two single-precision 74-bit, mantissa multiplier can perform one 53×53 or two 24×24 unsigned multiplication. The new design leads to slight delay but significantly reduces the area and obtains a shared hardware resource.Second, new design uses 4-2 compressor. Compared to CSA compressor,it has smaller delay and higher compression efficiency. Leading Zero Anticipation uses parallel probe error correction technology,, avoids an increase of the critical path delay. A two-stage shift normalization method is used. This shortens the operand word and also reduces the delay of Leading Zero Anticipation and normalized shifter.The design is implemented with three-stage pipeline and the RTL code is implemented with VHDL language. On a process of 0.18-micron CMOS standard cell library of TSMC,Synopsys’ Design Compiler is used as a comprehensive tool. The comprehensive area is 728588 square microns and the maximum delay time is 3.45 ns per stage.Compared to the traditional double precision floating point multiply-add fused units,the area of multi-mode floating point multiply-add fused unit only increased by about 18% and the delay time increased by about 10%.
Keywords/Search Tags:FPU, Multiply-Add-Fused, SIMD, IEEE-754, VHDL
PDF Full Text Request
Related items