Font Size: a A A

Research And Application Of Distributed Storage Based On HDFS

Posted on:2013-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:M TongFull Text:PDF
GTID:2248330392457242Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The development of information technology leads to large-scale data storage becomemore common. Data needs to be saved in long term and the size of data is growing sharply.But the traditional file system cannot meet the requirements in the storage capacity,storage efficiency and storage security. Distributed storage is capable of storing massivedata and it is important technical way for mass data storage. In recent years, Hadoop as astorage and handling large data solutions be favorable for major companies at home andabroad. Hadoop Distributed File System as the Hadoop core can be used as large-scaledistributed data storage solution.It does some research on the distributed storage based on HDFS,including small filesproblem, strategy of ReplicationTargetChooser and Rack-Awareness, NameNode backupand recovery strategy, scalability of HDFS. There are three ways to resolve the small fileproblem in HDFS,that is, Hadoop Archive, Sequence File and CombineFileInputFormat.Strategy of ReplicationTargetChooser and Rack-Awareness can make NameNode get theNetworkTopology of DataNode and choose the location of replication, to ensure thereliability of the data taking into account the efficiency of data transmission. NameNodebackup and recovery can make sure the safety of metadata by backing up the metadata andforming checkpoint periodically. If NameNode goes wrong, it can save the time ofrestarting NameNode,and even restore lost data. The HDFS scalability reflected in theadding DataNode dynamiclly, enable to meet the growing demand of large-scale data.Finally, do an application based on HDFS cluster and make a comparison HDFScluster with the FTP on the efficiency of transfering file, reflecting the feasibility oflarge-scale data storage solutions based on HDFS.
Keywords/Search Tags:Distributed Storage, Distributed File System, Hadoop Distributed File System, Storage of copies
PDF Full Text Request
Related items