Font Size: a A A

Design And Implementation Of FFTs Based On Multicore Vector Processor

Posted on:2015-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:H W XiangFull Text:PDF
GTID:2308330479479459Subject:Software engineering
Abstract/Summary:PDF Full Text Request
FFT algorithm plays an important role in high-performance computer domain as a digital signal processing tool. It has theoretical significance and application value in studying the vectorized design and implementation of high effic ient FFT based on X-DSP’s multicore architecture.Character of FFT algorithm is deeply analyzed in the paper. Paper’s main work includes having completed the design and implementation of radix-2, radix-4, large point FFT and mixed-radix FFT based on X-DSP platform. As is shown below:(1) Design and implement radix-2 FFT algorithm on X-DSP. After analyzed DIT and DIF radix-2 butterfly unit, some improvement measures are carried out to make full use of fused multiply-add instructions, therefore, FLOPS throughput is increased; Combing shuffle needs with memory access requests and utilizing software pipelining method to improve enhance the execution efficiency of programs. Experiment results show that, comparing to CUFFT, single precision average computing performance speedup is 3.12, double precision is 22.97. Comparing to FFTW, single precision average computing performance speed-up ratio is 3.52, double precision is 25.29.(2) Design and implement radix-4 FFT algorithm on X-DSP. Make full use of fused multiply-add instructions to optimize the DIT and DIF radix-2 butterfly unit, meanwhile, combe the shuffle needs with memory access requests and employ software pipelining method.Experiment results show that the average performance of radix-4 FFT has improved 11.46%-21.34% to DIT, and 9.1% to DIF, compared with radix-2.(3) Design and implement large point FFT algorithms on X-DSP. Make detailed analysis on both MFA and iteration FFT. Then single-core programs of large point FFT algorithms are carried out based on DMA double buffer. A novel method used in compressing storage factors is proposed which saves a lot storage space. In the meanwhile, map the parallel block MFA algorithm to the multi- core X-DSP, optimize the load balancing, and implement large point FFT algorithm efficiently in the final. As the result shows, the average speedup has reached 6.43, achieved higher performance.(4) Design and implement mixed-radix FFT algorithms on X-DSP. Improving work about fused multiply-add instructions is done to radix-3 and radix-5 FFT algorithm’s butterfly unit. Experimental result shows that the calculating time of 1536-point and 2400-point single precision is respectively 0.00247 ms and 0.00348 ms, achieves high performance. With the point increasing, the performance has improved significantly.
Keywords/Search Tags:FFT, Fused Multiply-Add, Multicore Processors, Vectorization, Software Pipelining
PDF Full Text Request
Related items