Font Size: a A A

Research And Optimization Of Storage Performance Of Massive Small Files In Cloud Environment

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:M S DaiFull Text:PDF
GTID:2428330623968551Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,people have entered the era of cloud,and the storage of files and user data in this environment shows a rapidly rising trend,and the importance of cloud storage is increasingly prominent.Based on the concept of cloud,a solution for storing massive data is developed,which is called cloud storage.The key of cloud storage technology is to use cluster technology,distributed file system to centrally manage the storage resources scattered on the network in the cloud,so as to meet the storage needs of data in the cloud environment.HDFS system is the most widely used and most mature of big data storage technologies.Based on HDFS 'own storage mechanism,it is faced with the problem of small memory of NameNode when processing a large number of small files.Therefore,through the optimization strategy reading and file storage technology analysis on HDFS system,it is of certain practical significance to discuss the processing of big data processing and massive small files.The following is the work of this thesis:(1)This thesis proposes a PS file merge algorithm.This algorithm is mainly used to store a large number of small files,which can achieve the mutual balance of the association between data blocks and files.After combining multiple small files into large files,they are saved on the HDFS system.Stored in Redis,this algorithm can maximize the use of less data to save information.Based on this algorithm,the HMM middle layer is established to process large amounts of small files,and the cache method is used in the process of obtaining data on HDFS to make reading information more efficient.(2)The distributed file system framework based on Hadoop is divided into different types for processing according to the extensions on small files,and then combined into a large file after completion,thereby consuming the amount of Namenode memory decreased.(3)By combining the least recently used algorithm and the least frequently used algorithm,the file with higher frequency is read and merged within a certain period of time,and then stored in the cache and file expectation.No data interaction with Namenode is required,small files can be read,and the efficiency and speed of reading files are faster.(4)In-depth study of users operating on massive small file storage platforms,and summarizing the functional requirements of users.Based on the open source Hadoop framework,the deployment of the development environment is based on the number,volume and unstructured degree of data resource files,combined with Redis memory Database and MySQL relational database to jointly create a cloud storage platform.
Keywords/Search Tags:HDFS, small files, file merge, cloud storage
PDF Full Text Request
Related items