Research On High Utilization Rate And Strong Scalability Of HDFS Storage

Posted on:2020-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330590463879

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With its high fault tolerance and reliability,HDFS has become the most widely used distributed file system in the field of large data storage.However,with the continuous development of the era of large data,the amount of data presents explosive growth,which requires HDFS to have higher storage utilization and strong scalability.Based on the above requirements,this paper finds the following three issues on the basis of in-depth analysis of HDFS:(1)HDFS achieves data redundancy through 3x replica strategy,which guarantees high reliability of file data.However,its additional replica is rarely accessed during normal operation,but it increases storage space and other resource overhead by 200%,and the utilization of storage space is low.(2)When HDFS stores a large number of small files,it will generate a large amount of metadata and increase the memory consumption and load of Namenode,which will affect the storage performance of HDFS.(3)Metadata in HDFS is stored in two files,FSImage and EditLog,and managed by Namenode loading into memory.This file-based metadata management strategy makes Namenode the bottleneck of HDFS scalability.In order to improve the storage space utilization and scalability of HDFS,L-HDFS,a highly scalable distributed file system based on HDFS is designed to solve the above three problems.The research contents and achievements of this paper mainly include:(1)A erasure code localization algorithm CLRC based on RS code is proposed to achieve HDFS data redundancy.Compared with multi-replica strategy,it significantly improves the utilization of storage space.At the same time,RS codes are improved by adding local check blocks to reduce the number of code blocks needed for data recovery.The experimental results show that compared with RS code,CLRC code can save bandwidth and I/O consumption in data recovery,decoding time is shorter,and has higher data recovery efficiency.(2)A small file merge storage optimization algorithm FEMA is proposed.The memory consumption of Namenode is reduced by merging small files into large files.The index of small files to blocks is established by the logical file anme,which generated through the encoding of file ID and block ID,and the caching prefetching mechanism is established to improve the access efficiency of small files.The experimental results show that FEMA algorithm effectively reduces the memory consumption of Namenode and has higher random reading performance.(3)A new metadata management scheme MBR based on RDBMS is proposed to improve the scalability of HDFS.In the first stage,the process of writing RDBMS metadata is designed and implemented.In the second stage,all original HDFS metadata files are abandoned and the reading process on RDBMS is developed,so that HDFS can work normally only through the newly constructed integrated metadata base.The experimental results show that the memory consumption of L-HDFS Namenode does not change with the increase of files or directories,so the cluster can be expanded to a greater extent,even to achieve distributed deployment across clusters.

Keywords/Search Tags:

HDFS, erasure code, small file storage, RDBMS, metadata management

PDF Full Text Request

Related items

1	Research And Implementation Of Small File Storage Model Based On HDFS
2	Research On File Accessing Performance Optimization Based On HDFS
3	Research And Implementation Of Small File Optimization Storage Management System Based On HDFS
4	The Storage Of Small Files In Distributed File System
5	High-performance File Storage And Management System Based On HDFS
6	Large Space Aggregation Storage Technology Research And Implementation For Massive Small File System
7	Research And Application Of The Optimization Strategy Of File Storage And Reading Based On HDFS
8	Research On Key Technology Of Small File Storage Based On HDFS
9	Optimization And Implementation Of Small File Storage In HDFS Under Hadoop Platform
10	Improvement Of HDFS Small File Storage Based On Har