Font Size: a A A

Research On Shared Last-level Cache Management Policy For CPU-GPU Heterogeneous Multiprocessor Architecture

Posted on:2022-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:P F XueFull Text:PDF
GTID:2518306314974159Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasing requirements of computer capability from deep learning,big data and other applications,traditional CPU-based architecture has already failed to meet the computation requirements of these new applications.The heterogeneous multiprocessor systems-on-chip(HMPSoC)that combines CPU and GPU who acts as accelerators into the same on-chip system becomes the mainstream trend.In addition to being good at 3D image rendering,GPU can also perform large-scale general parallel computing.On CPU-GPU heterogeneous architecture,cache is an important design to narrow the bandwidth gap between processor and main memory.Cache has an important impact on the overall performance of the system.CPU and GPU cores share the last-level cache(LLC).Because of the technical constraints that make it impossible to meet the growing application needs by continuously increasing LLC capacity,future processors need to rely on effective cache management policy to achieve higher performance from limited LLC capacity.However,after filtered by the high level cache,the memory access requests have poor locality on the LLC.Traditional cache replacement algorithms are mainly based on the spatial and temporal locality of the access request,so it is difficult to make effective use of the cache space.Besides,due to the differences of CPU and GPU architecture,the frequency,pattern and locality of CPU and GPU access requests are different.And there will be mutual influence between CPU and GPU due to competing for LLC space.On the whole,the characteristics of CPU and GPU access pattern on the LLC along with differences between them bring challenges to the design of LLC management policy.To address these challenges,this thesis aims to design shared LLC replacement policy suitable for heterogeneous multiprocessor architecture based on analysis of CPU and GPU access pattern on the LLC.Portable smart terminals such as mobile phones and tablets are the most widely used embedded heterogeneous system equipment in daily life.Compared with PC terminals,they have stricter requirements on cost,volume and power consumption,which often makes the amount of storage and computing resources limited.Large mobile game applications in intelligent terminals have the highest requirements on the performance of the system.Whether the shared LLC can be effectively used when the large mobile game operation determines the performance upper limit of intelligent terminal devices to a certain extent.The utilization of shared LLC partly determines the performance limit of smart terminal devices.To the end,this thesis first proposes a shared LLC management policy for mobile game applications,and then improves the algorithm in general architecture to adapt to different application scenarios.Based on the analysis of mobile game application cache access pattern,this thesis proposes a shared LLC replacement policy based on cache bypass and cache line reuse behavior prediction.First,we propose the cache bypass mechanism.The data hit once in the buffer called LRU buffer outside the cache can be inserted into the cache,and then bypass the access request with poor locality to reduce the replacement operation on the cache.Then the RDPV parameter adjustment mechanism is designed for the difference between CPU and GPU access frequency and access pattern.RDPV,as an estimate of the reference interval of the cache data block,it can be adjusted according to the access recency and frequency.Finally,the two mechanisms are used reasonably in cache Insertion and Hit-promotion stage for shared LLC management.In the actual CPU-GPU heterogeneous multiprocessor architecture,the application scenarios are more complex.The cache sensitivity of GPU applications is incorporated into the algorithm design,and the cache bypass mechanism and parameter adjustment mechanism are improved respectively.LRU_buffer filters access requests from GPU insensitive applications only.The RDPV is dynamically adjusted according to the access frequency and number of cache hit of CPU and GPU.And finally achieve the goal of improving the utilization efficiency of the shared LLC and the overall performance of the system.Experimental results show that,for large mobile game applications,we use memory access trace files captured from the last cache level of ARM Cortex-A76 architecture hardware platform to simulate memory access operations on the LLC.Compared with the traditional LRU replacement algorithm,our method improves the hit rate by 4.2%,and memory access traffic dropped by 6.3%.For workloads including CPU and GPU applications which directly run on the gem5-gpu simulator,out method improves performance by 10.3%compared with LRU.At the same time,the hardware cost of the algorithm is only 0.58%of the LLC capacity.LRU_buffer only need 24 KB hardware space.
Keywords/Search Tags:Heterogeneous Multiprocessor Architecture, CPU-GPU, Shared Last-level Cache, Cache Replacement Algorithm, Cache Bypass
PDF Full Text Request
Related items