| As one of the common techniques to hide off-chip memory delay,prefetching can effectively alleviate the "memory wall" problem.Traditional prefetchers predict the data to be accessed in the future by learning memory access patterns.However,the complexity of data organization leads to the irregular characteristics of memory access,and the use of multi-core system makes it difficult to distinguish the access patterns,both of these make it hard for traditional prefetchers to predict accurately.To this end,machine learning with capabilities of complex problem modeling and learning is used to design prefetchers.However,existing machine learning-based prefetchers only focus on the accuracy of address prediction,ignoring the importance of adjusting prefetcher aggressiveness.If the prefetcher aggressiveness is not properly adjusted,the cache pollution and performance gains could not be effectively balanced,so the cache performance could not be further improved.To solve this problem,a Dynamic Prefetching Model Based on Performance-Aware and Branch Dueling(DPAD)is proposed.First,deep reinforcement learning is used to model the prefetcher,extracts the characteristics of the access request as the input of the branch dueling agent,and takes the address delta and prefetch degree as the output action.The prefetch request is generated by address delta and prefetch degree as the prefetch decision of the agent.Then,a performance gain calculation method is designed based on filters.For each prefetch decision,the method calculates a reward value according to the effect and the benefit of prefetching to evaluate the quality of prefetch decision under the current system environment,and then assigns the reward to the agent to adjust its address prediction and prefetcher aggressiveness adjust strategy.Moreover,in order to reduce the prefetch cache pollution,a prefetch-aware cache replacement strategy is proposed based on protected segment.The strategy updates cache with two different schemes by distinguishing prefetched data and demand-fetched data,and sets up protected segment based on reuse distances,so that it can keep the prefetched data from evicting the protected data.DPAD is tested on SPEC CPU2006,SPEC CPU2017,PARSEC,Ligra,and Cloudsuite workloads using the ChampSim simulator,compared with four state-of-art prefetchers Bingo,MLOP,Pythia and DSPatch.The experimental results show that,DPAD outperforms18.7% and 3.5% than DSPatch and Pythia in twelve-core,10.8% and 5.2% than DSPatch and Pythia in 4MB last level cache and maintains the performance advantage under different bindwidth all the time. |