| With the increasing growth of data to be manage, distributed storage systems offers a reliable platform for storing huge amounts of data over a set of storage nodes distributed over a network. Ensuring data reliability against storage node failures requires the introduction of redundancy. Maintaining the required redundancy, the system must support data recovery (or repair,) which includes reading data from existing nodes and reconstructing essential data in the new nodes. Replication based-scheme is the simplest approach many large storage systems use for redundancy. Erasure coding is another approach for redundancy. There is also regenerating code-a network coding based scheme erasure codes for fast data repair in distributed storage systems.When a storage node fails, it has to be repaired by downloading the whole file from some other nodes storing a fraction of the data, and then re-encoding the data. Therefore the bandwidth required in order to recover the data stored in a single node that stores only a fraction of the entire message is wasteful.This research work presents a practical network coding approach for the Google file system. This approach focuses on network coding and compares it with the replication scheme, that Google file System uses to provide redundancy. We study the performance in the two cases by evaluating the probability of failure of any chunk, and their ability to recover the original data from any surviving data chunks and we also evaluate the average bandwidth as a number of transmissions required when a request is made by the client to read a data. We observed that with network coding, the system is more robust and resilient to failure and provide better performance than with replication scheme to provide redundancy. |