| Deep scaling of memories in conjunction with increased process variation has resulted in increasingly faulty memories.Emerging memories,particularly phase-change and resistive memories,are promising alternatives to conventional DRAM main memories,due to their read performance,density,and nonvolatility and resulting low static energy.Unfortunately,reliability is still a significant challenge as limited write endurance,exacerbated by process variation,leads to increasing numbers of stuck-at faults over the memory’s lifetime.This includes a significant number of stuck-at faults that appear early in the memory’s service.Fault-tolerance schemes and wear leveling methods are two main solutions,aiming at solving the limited endurance problem.The first solution emphasizes on the recovery of faults after they appear,including Encoding and Correction schemes,Pointer-based schemes,and Partition-and-flip(PAF)schemes,while the second solution attempts to avoid early faults,by balancing the writes throughout the memory.To extend the lifetime of emerging memories,two PAF fault mitigation schemes,one fault correction scheme that combines PAF and pointer-based schemes,and a novel wear leveling scheme are proposed.In detail,the main work and contributation of this dissertation are listed as follows:1.We propose two novel correction schemes which substantially enhance the fault tolerance capabilities of existing Partition-and-flip techniques.Emerging memories,particularly phase-change and resistive memories,can experience stuck-at faults due to limited endurance.One class of solutions to this problem partitions data into blocks,and inverts blocks as needed to ensure data is written to match stuck-at cells(partition-and-flip schemes).In Chapter 2,we propose two novel correction schemes which substantially enhance the fault tolerance capabilities of existing partitionand-flip techniques.First,dynamic partitioning increases the number of possible configurations with equivalent auxiliary bits.Our second novel partitioning method,relaxed partitioning dramatically and effectively increases the partitioning search space by specifying minimally overlapping configurations.Experimental studies,including Monte Carlo Simulations,Benchmark-based Similations,and Probilistic Simulations,proved the superioty of the proposed two fault recovery schemes over their counterparts in fault mitigation.2.We propose a method to extend the effectiveness of ECP coverage called Yoda,which utilizes a small number of additional encoding bits in order to dramatically improve the effectiveness and fault correction capability of ECP.Error-correcting Pointers(ECP),proposed by Microsoft,is a popular proposal to mitigate stuck-at faults in PCM by recording the addresses and the values of faulty bits in order to extend the lifetime of the memory.However,ECP can only recover few faults with a moderate area overhead per block.In Chapter 3,we propose a method to extend the effectiveness of ECP coverage called Yoda,which utilizes a small number of additional encoding bits in order to dramatically improve the effectiveness and fault correction capability of ECP.By adding one additional bit to ECP which corrects f faults,Yoda can correct 2f+1 faults.Further improvements are possible introducing small numbers additional bits.Our simulation results demonstrate that Yoda has a 3.0× improvement in fault coverage compared to a fault-aware ECP with a similar overhead,while also providing a 2.5-3.0× improvement over state-of-the-art schemes with comparable complexity.Furthermore,Yoda provides a method to protect the auxiliary bits,also with a small overhead.By adding one auxiliary bit to protect the auxiliary bits,Yoda can achieve extra improvement.3.We propose RETROFIT,a new technique to combine error correction,row sparing,and wear leveling to prolong memory lifetime.Phase-change memory(PCM)and resistive memory(ReRAM)are promising alternatives to traditional memory technologies.However,both PCM and ReRAM suffer from limited write endurance and due to process variation from scaling,increasing number of early cell failures continue to put pressure on wear-leveling and fault tolerance techniques.In Chapter 4,we propose RETROFIT,which leverages the spare “gap” row used as temporary storage in wear leveling to also be used strategically to guard against early cell wear out.RETROFIT is compatible with error correction schemes targeted at mitigating stuck-at faults and provides benefits when single or multiple spare rows are available.RETROFIT enhances lifetime by as much as 107% over traditional gap-based wear leveling and 8% over perfect wear leveling with a similar overhead.Furthermore,RETROFIT scales better than wear-leveling combined with error correction as process variation increases.In summary,this dissertation primilarily focuses on the limited endurance problem of emerging memories,and proposes fault-tolerance schemes and wear leveling method as solutions to tackle the problem.Finally,the effectiveness of the proposed schemes and methods are extensively studied by different kinds of experiments,comparing to state-ofthe-art counterparts,respectively. |