Font Size: a A A

The Research And Implementation Of Mass Small File Storage System

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J J XuFull Text:PDF
GTID:2348330566465755Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity and rapid development of web2.0,data size increased in exponentially.In order to storage and manage the huge amount of data effectively,many research institutions and companies propose their solution for distributed data storage.But most of the impact comes in the Hadoop Distributed File System,it has been widely used in academia and polymer industry.However,HDFS is primarily designed for streaming access of large files and suffers performance penalty while managing massive small files.This paper is based on Picture Retrieval System and Zhonghuaziku Project,both of them should deal with lots of smaller files.According to the project needs,we choose to build Massive Small File Storage System based on HDFS.By analyzing the structure of HDFS,we find the reason why HDFS does not support massive small files.In this paper,an approach called HIFM(Hierarchy Index File Merging)is proposed to improve the efficiency of storing and accessing small files on HDFS.HIFM is an approach based on combining small files,it mainly includes five parts.Firstly,through concatenating small files into large one,HIFM can reduce the number of files being stored,which further reduces the memory overhead.Secondly,during the process of file merging,two-level index files are produced,centralized storage and distributed storage methods are used to manage index files.Third,all index files are preloaded for the purpose of reducing reading cost.Fourth,the size of files under-filled can dynamically increase.So in order to less the NameNode’s memory cost,few small files storing can be appended to one of under-filled files.Fifth,for the purpose of improving the efficiency of accessing sequential small files,prefetching mechanism is used.Finally we build a Massive Small File Storage System based on HDFS and HIFM,and test and analyze its performance.The results show that HIFM can improve the efficiency of storing and accessing small files on HDFS,and mitigate the load of NameNode and DataNode obviously.The average time to read a random small file in the system is about 20 milliseconds,it can meet the requirement of the application.
Keywords/Search Tags:HDFS, Small files, Two-level index files, Preloading Prefetching
PDF Full Text Request
Related items