Font Size: a A A

Research And Application Of Small Files Access Optimization Method

Posted on:2017-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y L SongFull Text:PDF
GTID:2308330503453767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of information technology and the rapid development of the Internet, business and personal data explode. According to relevant research shows that by 2020 the global amount of data will reach 35 ZB. Big Data era has arrived, Traditional data storage methods have been unable to meet the massive data storage requirements. HDFS, represented distributed file system, by its high reliability, high scalability, high fault tolerance, low cost, large data era of massive data access, provides a new model for us, However, when dealing with small files, HDFS exists low access efficiency, metadata information system data storage consumption and high redundancy and other issues. So the study and optimization of the mass storage method of small files has become one of the hottest home and abroad.The paper makes a comprehensive analysis of HDFS distributed file system and introduced deduplication technology and analyzes the shortcomings of the HDFS system. For the network in the presence of a large number of small file-level deduplication the corresponding processing strategies are adopted respectively. The main research contents and innovations are as follows:(1) A small file merging algorithm based on similarity is proposed. Firstly the strategy is designed to extract the file key, calculate the similarity of the keyword file using Hamming distance, small documents into large file upload to HDFS; Combined with the small file merging scheme, the paper analyzes the metadata structure and the storage location of the small file, and the detailed design of the small file read and write operation flow, Effectively reduce the system’s I/O operation, ease the pressure of the Name Node storage metadata, and indirectly increase the storage capacity of the system.(2) Aiming at the problem of data redundancy in the system, In this paper, based on the TTTD algorithm the IOTDoptimization algorithm were proposed which can significantly reduce the size of the file block size, enhance data deduplication rate At the same time, in order to accelerate the speed of data to the query index table, the RUH table is introduced, and the Map Reduce programming model is put into the RUH table to reduce the time of query index table.Experimental results show that the proposed scheme can effectively reduce the memory usage rate of Name Node, and the efficiency of the storage of redundant data, greatly improve the management performance of small files.
Keywords/Search Tags:Small file, access optimization, HDFS, data deduplication
PDF Full Text Request
Related items