Font Size: a A A

Research And Implementation Of Hadoop Distributed File System Optimization Method Based On Network Coding

Posted on:2020-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:H J CuiFull Text:PDF
GTID:2428330578957100Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the explosive growth of digital information,scientific information and other kinds of data,the storage and management of massive data has become an important aspect of research.Distributed storage system is widely used due to its large storage capacity,strong expansibility and other performance advantages.In distributed storage system,duplication mechanism is often used to realize redundancy,which could lead to large storage cost and repair bandwidth cost.In order to improve the overall performance of the system,network coding is introduced into distributed storage system.Although network coding can significantly improve system performance,it requires many nodes to be connected for repair operations,which severely increases disk I/O.At the same time,the data encryption of the distributed storage system based on network coding has received a lot of attention.Aiming at the fault-tolerant problem and data encryption problem of distributed storage system,this paper takes the Hadoop distributed file system(Hadoop Distributed File System,HDFS)as the target system,and studies the system performance and data encryption of the distributed file storage system which applies erasure correction code and network coding.The primary work of the paper includes:(1)Considering that disk I/O is too large in the process of repairing fault data when network coding is applied in distributed storage system,we improve the minimum storage regenerative code(Minimum Storage Regeneration Code,MSR)of the system.In view of its local repair,we propose a local repair code based on system MSR code.Duplication mechanism,erasure correction code mechanism and local repair coding mechanism based on system MSR code are respectively applied to HDFS,and the performance of storage overhead,repair bandwidth overhead and disk I/O overhead are studied.Through theoretical analysis and experiments,it is proved that the method proposed in this paper can significantly reduce the I/O operation of the disk at the cost of some storage overhead.(2)Aiming at the problem that data encryption is too large in distributed file system,we study the encryption mechanism of HDFS system based on network coding,and propose a lightweight encryption mechanism.This mechanism no longer encrypts all data or encoded data,but only encrypts the encoding matrix used in the encoding process,combining network encoding and encryption operations.Through data analysis and experiments,it is proved that this encryption method reduces the amount of data needed to be encrypted and improves the efficiency of the whole system on the premise of ensuring the security of the system.
Keywords/Search Tags:Network coding, HDFS, Local repair, Encryption methods
PDF Full Text Request
Related items