Font Size: a A A

Self-Tuning Data Prefetcher For Embedded System

Posted on:2014-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y XinFull Text:PDF
GTID:2268330425481402Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
On the problem of memory wall that exsists in computer system, mordern processor resorts to pre fetching which predicts future load addresses according to the regular access pattern in applications to reduce cache misses. However, several problems exist in current prefetching methods.1) Prefetchers on main stream commercial processors works on sequential address patterns while linked pointer patterns are abundant in many applications;2) Current pointer prefetcher prefetches all address-like values which have an accuracy lower than10%; and3) Prefetching on multicore system can add to resource contentions which derogates the system performance.We have developed a cycle-accurate simulator compatible with MIPS32ISA which supports the function, timing, and cost model for single/multicore embedded processor. We give some soultions to the problems of the current prefetching mechanism. Based on the design space acquired from application analysis, a bi-mode data prefetching solution for single-core embedded processor is proposed. The system can tune the aggressiveness of two prefetching modes based on hardware collected run-time information. The experimental results on single core simulator for EEMBC, SPEC CPU2006, and OLDEN benchmarks show that, the proposed prefeieher has the average accuracy of36%,40%, and56%on the three benchmarks while CDP prefether has the accuracy of8%,9%, and24%, respectively. The system performance is improved by7%,6%, and9%compared with stream prefetcher, CDP, and GHB, respectively.On multi-core and multi-thread programming environment, we present a thread classifying directed data prefetching mechanism based on run-time information to tune the prefecthing mode and aggressiveness mitigating the resource contention in the memory system. Our solution has two new components:1) a filtering mechanism that informs the hardware about which prefetch requests can cause shared data invalidation and should be discarded, and2) a self-tuning prefetcher that uses runtime feedback to adjust each thread data prefetching mode and arguments. Our multicore mechanisms improve the overall performance of16-core system that uses filter prefetching and thread classification by2%and6%over a baseline system, respectively. We compare our proposal to feedback directed prefetching (FDP) technique and find that it provides better performance by4%on multi-core systems, while requiring4%less energy delay product.
Keywords/Search Tags:data prefetch, multicore, self-tuning system, embedded processor
PDF Full Text Request
Related items