Font Size: a A A

Research On The Performance Optimizations For Stencil Computations On ARM High-performance Processor

Posted on:2017-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:L X FengFull Text:PDF
GTID:2428330569499047Subject:Software engineering
Abstract/Summary:PDF Full Text Request
SoCs based on ARM's high-performance processors are a candidate for the next generation of high-performance computing systems.The latest ARMv8 architecture is not only a multi-core architecture but also multi-level Cache architecture that can ease the mismatch of between calculation speed and memory access,expand into 64-bit architecture and introduce new advanced SIMD's support.this provides a multi-level multi-granularity parallel processing capabilities.Stencil Computation is a kind of important computing core,which is widely used in image and video processing and large-scale scientific and engineering calculation.The performance optimization of Stencil Computation is paid attention increasingly.In addition,the study of Stencil Computation focus on optimizations of parallelization and vectorization.In this paper,I research the performance optimizations of the OpenMP parallelization and vectorization for Stencil Computation based on the ARMv8 architecture processor as a platform,the main work and innovation are as follows :First,the architecture features of ARMv8 processor and the method of typical OpenMP parallelization are studied.According to the characteristics of multi-level cache and the access mode of program,Thread binding method increases the cache hit ratio,reduces the scheduling overhead of threads,and improves the perfomance of parallelization.Secondly,the advanced SIMD extension technology on ARMv8 processor platform and the current automatic vectorization method is analyzed.The analysis of core of Stencil Computation shows that simple vectorization can not improve the performance of the multi-dimension Stencil Computation.Therefore,I use the ‘collapse' clause to collapse the multiple cycles of Stencil Computation and combined with improved parallelization method,that enhances the effect of vectorization,which Further optimizes the performance of Stencil Computation on ARM high-performance processors.It is very meaningful to study the performance optimizations of parallelization and vectorization for Stencil Computation based on the ARM high-performance processor,which not only reduces the limitations of high-performance computing platform,but also meets the needs of more applications.
Keywords/Search Tags:Stencil Computation, ARMv8 architecture, multi-level Cache, OpenMP, Thread binding, SIMD, collapse clause
PDF Full Text Request
Related items