Research On The Performance Optimizations For Stencil Computations On ARM High-performance Processor

Posted on:2017-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:L X Feng

Full Text:PDF

GTID:2428330569499047

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

SoCs based on ARM's high-performance processors are a candidate for the next generation of high-performance computing systems.The latest ARMv8 architecture is not only a multi-core architecture but also multi-level Cache architecture that can ease the mismatch of between calculation speed and memory access,expand into 64-bit architecture and introduce new advanced SIMD's support.this provides a multi-level multi-granularity parallel processing capabilities.Stencil Computation is a kind of important computing core,which is widely used in image and video processing and large-scale scientific and engineering calculation.The performance optimization of Stencil Computation is paid attention increasingly.In addition,the study of Stencil Computation focus on optimizations of parallelization and vectorization.In this paper,I research the performance optimizations of the OpenMP parallelization and vectorization for Stencil Computation based on the ARMv8 architecture processor as a platform,the main work and innovation are as follows :First,the architecture features of ARMv8 processor and the method of typical OpenMP parallelization are studied.According to the characteristics of multi-level cache and the access mode of program,Thread binding method increases the cache hit ratio,reduces the scheduling overhead of threads,and improves the perfomance of parallelization.Secondly,the advanced SIMD extension technology on ARMv8 processor platform and the current automatic vectorization method is analyzed.The analysis of core of Stencil Computation shows that simple vectorization can not improve the performance of the multi-dimension Stencil Computation.Therefore,I use the �collapse' clause to collapse the multiple cycles of Stencil Computation and combined with improved parallelization method,that enhances the effect of vectorization,which Further optimizes the performance of Stencil Computation on ARM high-performance processors.It is very meaningful to study the performance optimizations of parallelization and vectorization for Stencil Computation based on the ARM high-performance processor,which not only reduces the limitations of high-performance computing platform,but also meets the needs of more applications.

Keywords/Search Tags:

Stencil Computation, ARMv8 architecture, multi-level Cache, OpenMP, Thread binding, SIMD, collapse clause

PDF Full Text Request

Related items

1	Optimizations Of Memory-access For Stencil Computations On Shared-memory Multi-core Processor
2	Automatic Generation And Performance Optimization Of Code In Stencil Computation
3	Research On The Auto-vectorization In Multi-thread And Multi-SIMD Parallelism
4	Research On The Key Techniques Of Directives Based Auto-vectorization
5	HOT Thread Level Speculation Research Based On OpenMP
6	The Research And Implementation Of The Key Techniques On Single Chip Multiprocessors
7	Study On The Key Technologies Of Thread-Level Speculation On Multi-core Platform
8	Research On Parallel Model And Compiler Optimization Technique Based On Multi-core
9	The Research Of Shared CMP Cache Management
10	Performance Optimization of Stencil Computations on Modern SIMD Architectures