With the rapid growth of cloud computing and big data technology, cloud storage as a new type of storage patterns plays an important role in the field of mass data storage, and received extensive attention from the business community and academia. Cloud file system as an important part of cloud storage system, provides underlying storage for the cloud storage system and is responsible for effective and reliable data storage, in order to ensure reliability and stability of the system. However, with the size of the storage cluster growing, the storage node failure become a universal phenomenon, Erasure Code redundancy technology with strong ability of fault tolerance and high space utilization has been applied gradually in cloud file system.In the file system based on Erasure Code, combining the characteristics of the Erasure Code to implement large-scale data efficiently, balanced, and fault tolerant storage become the focus of the research. According to the above situation, this paper carries out in-depth study on three aspects of cloud file system load balance, data placement and data recovery based on Erasure code. The paper mainly achieves the following results:1) A method of evaluating the load capacity of storage nodes is proposed. The method analyze and determine the main factors influencing load capacity of system nodes, use analytic hierarchy process modeling and solving impact factor weights of system nodes load, put forward calculation formula for load capacity of nodes and load classification.2) A data placement algorithm for load balancing BDPA is proposed. Based on the method of evaluating the load capacity, the algorithm allocates data blocks based on real-time load ability of nodes and make sure each node just store a data block belonging to a file. The algorithm cannot only ensure reliability of data but alsocan realize the load balance among storage nodes and accelerate the data writing speed.3) A data recovery algorithm based on topology-aware is proposed. According the feature of Erasure Code data recovery, the algorithm distinguishes the original data object and the encoded data object in the process of data repair and chooses the blocks which near to the access node or recovery client according to the topology. The method can reduce network transmission overhead and increase the speed in the process of data recovery.4) A file system based on Erasure Code(EC-HDFS) is designed and implemented. The test analysis and compared with the open source HDFS-RAID system in Facebook. The results show that the presented data placement algorithm for load balancing and restoration algorithm based on topology-aware can better realize load balancing of system nodes and improve data write speed about 12% and the speedup of data recovery about 15%. |