Font Size: a A A

Research Of SIMD Vectorization Optimization Based On Memory Access

Posted on:2012-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2218330371962638Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the improvement of support for float point operation, SIMD extension is used more extensivly to promote the performance of applications. However, uncontinuously and unaligned data reference lowered the efficiency of memory access in SIMD vectorization, which makes the performance of program is lower than expected. The important factors that influence the efficiency of memory access are the cache hit rate and the number of memory access, the decrease of cache hit rates or redundant memory access will both influence performance.Array of structure is used frequently in many applications, in order to solve the problem of space waste to answer the requirement of alignment for array of structure, it is essential that memory pre-optimization is operated on it, which can reduce the memory space of compressed data and improve the ability to recognize SIMD vectorization.Member of array of structure reference during vectorization is usually vectorized incompletely and with severe overheads, in order to solve this problem, alignment optimization through array padding can reduce unaligned memory access.Non-array memember of array of structure vectorization can generate large overheads, in order to improve the performance, SIMD memeory access optimization of array of structure is implemented to reduce uncontinuous and unaligned access to memory.The accessed array subscript of a loop innter-iteration sometimes has nothing to do with loop index, so memory should be revisited. In order to reduce memory access, loop interchange makes it possible to reuse registers without influence on cache hit rates.While repeating access of thesame data of array in different iteration step of loop, vector register will repeatedly access the data from cache. Loop Unroll and Jam can reuse some vector registers to reduce much repeated memory access.The compiler of vector identifying and automatic vectorization in topic studies is experimented on the experimented platform which is only used in study. Experiment results on the test suites of gcc-vect and Callahan-Dongarra-Levine show that vecotoriztion-identifying ability of the compiler is better than INTEL11.0. Experiment results on SPEC CPU2000 and NPB3.2-SER show that arithmetic in topic is correct and can promote the performance of program.
Keywords/Search Tags:SIMD, Array of Structure, Memory Access, Reuse of Vector Register, Loop Interchange, Loop Unroll and Jam
PDF Full Text Request
Related items