Font Size: a A A

Research And Implementation Of Optimization Technology Of Fault-tolerance Parallel Filesystem Based On Erasure Codes

Posted on:2016-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:W LuFull Text:PDF
GTID:2348330536467463Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing of system scale of HPC(high performance computer),traditional hardware fault-tolerance method can not afford for the demand of high reliability,and making using of software method to protect data from failure is general trend.Compared to other methods,Erasure codes(EC)has the advantages of high storage efficiency and fault tolerance,and it has received wide attention in cloud storage field.So,how to apply EC into storage system of HPC to protect data from failure in software layer has become a hot-spot in the field.In order to study the feasibility of applying erasure code to HPC storage system,we analysis different EC on computing cost,decoding cost,fault tolerance and storage efficiency,we also analyze the hot issues in the application of HPC to the storage system.Based on the above analysis,we studied the following contentsWe analyze the effect of erasure code on the performance of HPC storage system and optimize key I/O operation.Besides,we propose caching technology based on data locality to reduce the number of encoding computing and pipeline technology based on multiple parallel pipelining to further hide the overhead of decoding computing and writting parity blocks in sequence write operations.In addition,we analyzed theoretically quantify the effect of optimization techniques.We study how to make full use of EC's recovery function to improve the reliability and performance of the storage system,specifically in the following three aspects: 1)We analyze the effects of EC to recovery operations,and propose optimal solution.2)We analyzed the effect of erasure codes for degraded read operation and proposed PC-Read method based on pre-reading mechanism and caching techniques to optimize degraded read operation occured during sequential reads,and analyzed theoretically quantify the effect of optimization techniques to degraded read 3)we propose CR-reading method to solve the problem of low efficiency of collective read caused by IO Variability and an algorithm to determine the best time to use this method.In order to test the optimization method,we designed and implemented a system EDFS,which is a fault-tolerance proto-system based on EC and deployed on parallel file system.EDFS support verification of actual results of the optimization method to sequential write,PC-Read method and CR-Read.Through the data analysis,we believe that the fault-tolerant method of EC may be applied to HPC storage system by technology optimizing,when both performance of data access and reliability is Considered.
Keywords/Search Tags:Erasure Codes, Parallel Filesystem, Fault-tolerance, Reliability
PDF Full Text Request
Related items