Font Size: a A A

Address Translation Optimizations for Chip Multiprocessor

Posted on:2018-05-24Degree:Ph.DType:Thesis
University:University of Toronto (Canada)Candidate:Papadopoulou, Misel-MyrtoFull Text:PDF
GTID:2445390002450971Subject:Computer Engineering
Abstract/Summary:
Address translation is an essential part of current systems. Getting the virtual-to-physical mapping of a page is a time-sensitive operation that precedes the vast majority of memory accesses, be it for data or instructions. The growing memory footprints of current workloads, as well as the proliferation of chip multiprocessor systems with a variety of shared on-chip resources create both challenges and opportunities for address translation research. This thesis presents an in-depth analysis of the TLB-related behaviour of a set of commercial and cloud workloads. This analysis highlights workload nuances that can influence address translation's performance, as well as shortcomings of current designs. This thesis presents two architectural proposals that both support our thesis that TLB designs and policies need not be rigid, but should instead dynamically adapt to the workloads' behaviour for a judicious use of the available on-chip resources.;The Prediction-Based Superpage-Friendly TLB proposal leverages prediction to improve energy and utilization of TLBs by allowing translations of different page sizes to coexist in a set-associative (SA) structure. For example, a 256-entry 4-way SA TLBpred achieves better coverage (7.7% less Misses Per Million Instructions) compared to a slower 128-entry fully- associative TLB. It also has the energy efficiency of a much smaller structure. This design uses a highly accurate superpage predictor that achieves a 0.4% average misprediction rate with a meager 32B of storage.;The Forget-Me-Not TLB (FMN) proposal utilizes the existing cache capacity to store translation entries and to thus reduce TLB-miss handling latency. A per core private 1024-entry direct-mapped FMN reduces the average L1-TLB miss latency across all simulated workloads by 31.4% over a baseline with only L1-TLBs. Conversely, a dedicated 1024-entry 8-way SA L2-TLB reduces it by 24.6% and causes, in some cases, performance degradation. We further propose an L2-TLB bypassing mechanism to address this challenge.
Keywords/Search Tags:Address, Translation, TLB
Related items