| Modern data centers provide storage services by running distributed storage systems on many storage servers that occasionally malfunction.To ensure availability and reliability of data,distributed storage systems either employ replication(that is,to store multiple copies of all blocks)or erasure codes(multiple blocks of data are encoded into a set of coded blocks)so that an unavailable block of data can be reconstructed from a subset of the entire coded blocks.Compared with replication,erasure codes can provide data availability with lower storage overhead.Therefore,many distributed storage systems use erasure codes technology to provide reliable storage services.Erasure codes,however,have an obvious drawback.When reconstructing an unavailable block,erasure codes will have a high repair cost,and the data transmitted on the network is much larger than the original data block to be recovered,resulting in poor repair performance.The piggybacking framework is able to solve this problem well,simultaneously satisfying maximum-distance-separability(MDS),high rate and a small number of substripes,which solves the conflict between coding theory and practical systems.We aim to solve the repair problem of distributed storage codes by studying coding theory based on the piggybacking framework.We first focus on how to repair systematic nodes and parity nodes evenly to reduce the average repair cost of all nodes.By designing a generic piggybacking code and analyzing the key factor that affect its repair performance,we propose a specific piggybacking code and analyze the lower bound of its repair cost.Analysis shows that our piggybacking code effectively reduces the average repair bandwidth ratio of all nodes.We then focus on reducing the repair cost of parity nodes of existing piggybacking codes.Using the generic transformation tool,we optimized the REPB code for repairing parity nodes,and obtained the piggybacking code that the repair of parity nodes reaches the MSR code boundary,which is called GREPB code.GREPB code can effectively reduce the repair cost of parity nodes by expanding the number of REPB instances and performing permutation and pairs of linear transformations in parity nodes.Meanwhile,the repair cost of systematic nodes is the same as that of the underlying REPB code.Performance analysis shows that the average repair cost of parity nodes and all nodes is very low,but this is at the cost of multiplying the number of substripes.Finally,given the requirements of the practical system,we focuse on how to design erasure codes for efficient repair under the premise of low number of substripes.By using REPB code as the underlying code and performing additional piggybacking operations,we obtained a piggybacking code that can efficiently repair systematic nodes and parity nodes while satisfying the number of substripes at the square root level,called NREPB code.Performance analysis shows that NREPB code reduces the average repair cost of parity nodes and all nodes while maintaining the efficient repair of systematic nodes.In addition,the number of substripes at the square root level also reduces the complexity of the algorithm. |