| As the CPU speed continues to increase,the memory access speed lags behind,causing the memory bottleneck problem to become increasingly serious.Although the use of cache can compensate for memory latency to some extent,the cache failure problem still exists,resulting in the waste of computing resources.Therefore,in order to improve the utilization rate of CPU cache,software optimization becomes particularly important.Currently,software optimization technology mainly includes two methods: code reordering and prefetching.Code reordering can modify the memory layout of the program before it runs to make the program have strong spatial locality.However,existing code reordering are mainly concentrated at the binary level and have limited use scenarios.Prefetching can learn the memory access pattern during program execution and load data into the cache before it is accessed to reduce CPU stalls.Although using machine learning can improve the coverage and accuracy of prefetching,it has problems such as too long training time or insufficient coverage.This thesis carries out research on software optimization based on code reordering and prefetching.The main work contents are as follows:(1)Propose a code reordering algorithm RCPD(Reorder Based on Cache Line,Page and Distance)based on multiple information for source code optimization.Existing optimization algorithms are mainly focused on the compilation and linking stages and require the use of specific compilers or platforms.RCPD overcomes this limitation and rearranges functions at the source code stage.First,collect runtime information of the program through hardware sampling and process the call graph of the program with optimization algorithms.Then,after comprehensively considering information such as cross-page and function call relationships,the functions are rearranged and small functions are aligned with cache lines to improve program locality.Experiments show that this algorithm can effectively reduce indicators such as cache misses and page faults on multiple compilers and has good results.(2)Propose a hardware-based LLC prefetch model GACP(GRU-Attention for Cache Prediction)prefetcher.To address the problem that prefetchers cannot balance prefetch accuracy,prefetch coverage and prefetch time,GACP uses page offset as a feature to reduce the network output dimension.In addition,to avoid the problem of reduced prefetch accuracy that may be caused by the use of offsets,a PC(Program Counter)sequence is introduced and the embedding layer is used to reduce the dimensionality of the PC sequence and offset sequence.Then,a GRU is used to capture short-term dependencies between addresses,while multi-head attention is used to calculate the attention weights of each historical address,allowing the model to focus on multiple positions in the input sequence simultaneously and learn the program’s memory access pattern.Finally,the prefetch result is calculated and output through Softmax.If a page crossing occurs,the target page address is obtained using a page translation table.Experimental results show that this method has good performance in terms of accuracy,coverage,MPKI and IPC.Firstly,this thesis aims to optimize the static layout of software and improve program locality using the RCPD algorithm.Additionally,it explores the use of the GACP prefetcher to learn the software’s memory access patterns and proactively prefetch data that may cause cache misses,thereby effectively improving the software’s execution speed.The thesis includes 41 figures,5 tables,and 82 references. |