Research And Optimization Of Storage Performance Of Massive Small Files In Cloud Environment

Posted on:2021-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:M S Dai

Full Text:PDF

GTID:2428330623968551

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Nowadays,people have entered the era of cloud,and the storage of files and user data in this environment shows a rapidly rising trend,and the importance of cloud storage is increasingly prominent.Based on the concept of cloud,a solution for storing massive data is developed,which is called cloud storage.The key of cloud storage technology is to use cluster technology,distributed file system to centrally manage the storage resources scattered on the network in the cloud,so as to meet the storage needs of data in the cloud environment.HDFS system is the most widely used and most mature of big data storage technologies.Based on HDFS 'own storage mechanism,it is faced with the problem of small memory of NameNode when processing a large number of small files.Therefore,through the optimization strategy reading and file storage technology analysis on HDFS system,it is of certain practical significance to discuss the processing of big data processing and massive small files.The following is the work of this thesis:(1)This thesis proposes a PS file merge algorithm.This algorithm is mainly used to store a large number of small files,which can achieve the mutual balance of the association between data blocks and files.After combining multiple small files into large files,they are saved on the HDFS system.Stored in Redis,this algorithm can maximize the use of less data to save information.Based on this algorithm,the HMM middle layer is established to process large amounts of small files,and the cache method is used in the process of obtaining data on HDFS to make reading information more efficient.(2)The distributed file system framework based on Hadoop is divided into different types for processing according to the extensions on small files,and then combined into a large file after completion,thereby consuming the amount of Namenode memory decreased.(3)By combining the least recently used algorithm and the least frequently used algorithm,the file with higher frequency is read and merged within a certain period of time,and then stored in the cache and file expectation.No data interaction with Namenode is required,small files can be read,and the efficiency and speed of reading files are faster.(4)In-depth study of users operating on massive small file storage platforms,and summarizing the functional requirements of users.Based on the open source Hadoop framework,the deployment of the development environment is based on the number,volume and unstructured degree of data resource files,combined with Redis memory Database and MySQL relational database to jointly create a cloud storage platform.

Keywords/Search Tags:

HDFS, small files, file merge, cloud storage

PDF Full Text Request

Related items

1	Optimization And Implementation Of Small File Storage In HDFS Under Hadoop Platform
2	Research And Optimization Of The Distributed Storage On HDFS
3	Research On Storage Strategy Of Massive Small Files Based On HDFS
4	Research And Implementation Of Disaster Big Data Management Methods Based On Cloud Computing
5	The Research Of HDFS Optimization Towards Lots Of Small Files Accessing And Storage
6	Reading And Writing Strategy Research Of Massive Small Files Based On HDFS
7	Design And Implementation Of Secure Cloud Storage System Based On HDFS Small File Processing
8	Design And Implementation Of Cloud Storage System Based On HDFS And Encryption Retrieval
9	Research And Implementation Of Small Files Storage Management Based On Hadoop
10	High-performance File Storage And Management System Based On HDFS