Font Size: a A A

Research On Repair Model Of Generalized Regenerating Codes For Clustered Distributed Storage System

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2428330611499428Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,data is gradually quantified and globalized,promoting the distributed storage system transforming to "cloud",which has developed into a clustered distributed storage system composed of different data centers in the world.In such a storage system,in order to save infrastructure cost,nodes are often cheap and unstable devices,so node failure is the norm.In recent years,the regenerating code based on network coding has become the research hotspot of node repair strategy in traditional distributed storage systems by virtue of its advantage of minimizing the repair bandwidth.The generalized regenerating code is used as the extension of the regenerating code in the clustered distributed storage system.It distinguishes the repair bandwidth within and without clusters,and can significantly reduce the storage overhead and the cross-cluster repair bandwidth,improving the system availability.However,the theory of generalized regenerating code is still in its infancy,and there are two problems to be studied.First,generalized regenerating codes reduce the cross-cluster repair bandwidth by increasing the relatively inexpensive local repair bandwidth,but due to the lack of exact mathematical relationship between generalized regenerating code's coding parameters and the repair bandwidth,it is difficult to clarify the specific effect of concrete generalized regenerating code on reducing the system repair cost.Second,at present,the generalized regenerating code does not consider the bandwidth cost difference between different clusters,the repair process across clusters is still symmetrically repair,which limit the application under the actual system;the main research contents of this paper for the above two issues include:The related basic theories on network coding are introduced,and the relationship and difference in the repair model,the information flow graph and the code construction between the regenerating code and the generalized regenerating code are elaboratedly demonstrated,clarifying the superior performance of generalized regenerating code and laying the foundation for the research on the theory analysis and model optimization of generalized regenerating code.For the first problem,based on the upper bound formula of the generalized regenerative code,we define the parameters for achieving the minimum storage overhead and the minimum cross-cluster repair bandwidth through linear programming.According to the characteristics of the clustered distributed storage system,the transmission cost model is established and the global repair bandwidth cost is defined to uniformly measure bandwidth costs within and without clusters.Considering the optimal parameters,we analyze the global repair bandwidth cost under the different limitation condition on the local help of generalized regenerating code,and obtain the specific relationship between the number of local helper nodes and the global repair bandwidth cost,which provides theoretical guidance for the generalized regenerating code parameter configuration.For the second problem,we improve the inter-cluster repair process of generalized regenerating code to asymmetric repair,and prove the upper bound of the reachable capacity based on the information flow graph,and derive the constraint of intra-cluster repair bandwidth to reach the bound.Based on the capacity constraint and the intra-cluster repair bandwidth constraint,the global repair bandwidth cost of the asymmetric model generalized regenerating code is established as a linear programming problem regarding of the inter-cluster repair bandwidth.The simulation solution proves that the asymmetric model reduces the global repair bandwidth cost effectively compared to the symmetric model under the same storage overhead,which increases the system availability.
Keywords/Search Tags:clustered distributed storage system, generalized regenerating codes, repair bandwidth cost, asymmetric repair model
PDF Full Text Request
Related items