Research Of SIMD Vectorization Optimization Based On Memory Access

Posted on:2012-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:M Yang

Full Text:PDF

GTID:2218330371962638

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the improvement of support for float point operation, SIMD extension is used more extensivly to promote the performance of applications. However, uncontinuously and unaligned data reference lowered the efficiency of memory access in SIMD vectorization, which makes the performance of program is lower than expected. The important factors that influence the efficiency of memory access are the cache hit rate and the number of memory access, the decrease of cache hit rates or redundant memory access will both influence performance.Array of structure is used frequently in many applications, in order to solve the problem of space waste to answer the requirement of alignment for array of structure, it is essential that memory pre-optimization is operated on it, which can reduce the memory space of compressed data and improve the ability to recognize SIMD vectorization.Member of array of structure reference during vectorization is usually vectorized incompletely and with severe overheads, in order to solve this problem, alignment optimization through array padding can reduce unaligned memory access.Non-array memember of array of structure vectorization can generate large overheads, in order to improve the performance, SIMD memeory access optimization of array of structure is implemented to reduce uncontinuous and unaligned access to memory.The accessed array subscript of a loop innter-iteration sometimes has nothing to do with loop index, so memory should be revisited. In order to reduce memory access, loop interchange makes it possible to reuse registers without influence on cache hit rates.While repeating access of thesame data of array in different iteration step of loop, vector register will repeatedly access the data from cache. Loop Unroll and Jam can reuse some vector registers to reduce much repeated memory access.The compiler of vector identifying and automatic vectorization in topic studies is experimented on the experimented platform which is only used in study. Experiment results on the test suites of gcc-vect and Callahan-Dongarra-Levine show that vecotoriztion-identifying ability of the compiler is better than INTEL11.0. Experiment results on SPEC CPU2000 and NPB3.2-SER show that arithmetic in topic is correct and can promote the performance of program.

Keywords/Search Tags:

SIMD, Array of Structure, Memory Access, Reuse of Vector Register, Loop Interchange, Loop Unroll and Jam

PDF Full Text Request

Related items

1	Register pressure guided loop optimization
2	Research On SIMD Auto-vectorization Optimization Technologies
3	The Design And Implementation Of Vector Memory Unit Of Multi-Width SIMD DSP
4	Research On Application Of Loop Transformation For Auto-vectorization
5	Research On Computing Memory For Cryptographic Algorithm
6	Design And Implementation Of SIMD Unaligned Memory Access Structure
7	Research On SIMD Vectorization Of Loop Nests And Its Optimization Techniques
8	Optimizations Of Memory-access For Stencil Computations On Shared-memory Multi-core Processor
9	Loop Realization And Optimization Based On X Stream Processor
10	Research Of SIMD Vectorization Algorithm And Regrouping Technology