| With the development of Cloud Computing and Big Data,reliable mass data storage becomes a hot topic.Building network distributed storage using cheap commercial servers is a substitute for the traditional expensive array storage.Distributed storage systems usually use multi-replications strategy which has a good load-balancing performance,however,at the cost of large storage overhead and poor fault-tolerance capability.Erasure-coding-based technology has been deployed to solve these problems.Erasure code,however,has a drawback of large bandwidth consumption for data recovery.Regenerating code has been proposed recently to save network bandwidth at the cost of computation overhead.In fact,access frequency differs at different stages of the life cycle of a file.Adaptive storage strategy should be applied to best suit the lifecycle:multi-replications for hot data’s load-balancing,erasure-code for cold data to save storage space,deduplication for archival data to save more space.This paper aims to build an adaptive storage system Cumulus based on HDFS,using multiple coding scheme.The main work is listed below:1)Implementing a flexible framework supporting multiple storage strategy by the abstraction of coding matrix.Suitable strategy can be applied according the state of the file.2)Enhancing the reading,writing,recovery processing performance of Cumulus.3)Presenting the Data Loss Severity metric to make up for the negative impact on the reliability of data in deduplication system. |