Font Size: a A A

Performance Optimization In Code-based Distributed Storage Systems

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:X TongFull Text:PDF
GTID:2308330485966365Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise of Cloud Computing and Big Data technology, reliable mass data storage becomes a hot topic. When choosing infrastructure for mass data storage applications, distributed storage systems built on cheap commercial servers have more advantages on cost and performance than the traditional expensive array storage. In order to ensure the data reliability, distributed storage systems usually use redundant strategy such as the multi-replications strategy and the encoding strategy. Among the encoding strategies, regenerating codes and locally repairable codes are proposed to reduce the bandwidth and disk I/O overhead in the node repair process. The encoding strategies can improve the storage efficiency and reduce the traffic overhead in the node repair process. However,other expenses introduced by the encoding method, such as computational overhead in the encoding and decoding process, will bring new performance bottlenecks to systems. In a coding based distributed storage system,we can reduce the performance bottlenecks brought by the encoding method by making full use of the system resources.For example,in the node repair process, we make use of the network topology information to further reduce the delay of the process.This paper aims to build high-performance storage systems,focusing on the node repair mechanisms based on network topology and the caching mechanisms for coding matrix in the coding based distributed storage systems. The main work is listed below:1) Designing and implementing a cache module for coding matrix in the namenode to reduce the memory burden in Cumulus.The cache module makes use of the fact that the access frequency of files are different. The module reduces the extra memory burden brought by the coding matrixes as the number of files grows large.2) Designing the parallel regeneration tree algorithm for the node repair process in Simple Regenerating Code. The algorithm makes use of the network topology information to construct regeneration trees for the new node. The algorithm can reduce the regeneration time in the node repair process compared with the direct regeneration method.
Keywords/Search Tags:Distributed Storage, Network Coding, Network Topology, Node Repair, Cache
PDF Full Text Request
Related items