Research On Storage Optimization Technology Based On HDFS In Cloud Environment

Posted on:2020-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Chen

Full Text:PDF

GTID:2428330590996026

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Under the background of big data,the value of data is more and more prominent.As a mass data storage model,cloud storage has become a hot research point.HDFS(Hadoop Distributed File System)clusters based on Docker containers have attracted the attention of many researchers because of their high throughput of data,rapid deployment of clusters,and the ability to run on inexpensive devices.However,the cluster has the reliability issue of data storage.Thus,it is necessary to optimize the data persistence technology and the algorithm of data replica placement.Moreover,although the data block backup of the HDFS cluster can ensure the security of data storage to a certain extent,the HDFS cluster cannot effectively perform flexible storage backup of various types of data in the cloud environment.The storage requirements of different types of data in the cloud environment need to be adjusted correspondly.Therefore,the data partitioning algorithm and the backup strategy need to be optimized accordingly.This thesis focuses on the storage optimization technology of HDFS in the cloud environment,mainly including the three aspects as follows.Firstly,for the reliability issue of data storage on HDFS clusters based on Docker containers,the data persistence technology is proposed to realize data sharing and the data persistence between the containerized HDFS cluster based on the technology of data volume and data volume container.The persistent data includes various types of data stored by the cluster and metadata of each Hadoop cluster node.Moreover,a data copy placement algorithm based on HDFS is proposed.When backing up the data block storage,this algorithm considers the performance of the host machine and the container node comprehensively,which can improve the reliability of cluster data storage and can also reduce the difference of available storage space between nodes.The experiment results show that the data persistence technology and the data copy placement algorithm can effectively migrate the cluster data,improve the I/O performance of the cluster,and also enhance the reliability of data storage greatly.Secondly,for the single backup strategy of HDFS clusters,the storage architecture based on Federation HDFS is used instead of the traditional HDFS cluster.For the data partitioned by the data partition algorithm,different storage strategies are applied to store the data in this storage architecture.Moreover,the data partition algorithm which is suitable for the large data environment is proposed.This algorithm assigns the values of data features and distances by means of quadratic weights to ensure the efficiency and improve the accuracy of data partitioning.The experiment result shows that the algorithm can effectively improve the accuracy and efficiency of data partition.And the data storage architecture based on Federation HDFS can reduce the waste of storage space and achieve effective data storage while implementing flexible storage backup.Finally,to solve the storage problems proposed above,a prototype system is designed and implemented,which are described from the four aspects including data storage reliability,data storage memory,data I/O access and data backup.The system test result demonstrates that: firstly,the HDFS cluster data persistence technology based on Docker container and the data storage replica placement algorithm can ensure data persistent storage and improve data I/O performance;secondly,the KNN-based data partitioning algorithm and the Federation HDFS cluster architecture can effectively ensure flexible storage backup of data and improve storage space utilization.

Keywords/Search Tags:

Cloud Storage, HDFS, Docker Container, Data Persistence Technology, Data Partitioning

PDF Full Text Request

Related items

1	Live Migration System Of Docker Container For Data Center
2	Research On Data Storage Management Technology Of Science And Technology Cloud Platform
3	Research And Optimizing Of Data Storage Under HDFS
4	Research On Data Integrity Verification Technology In Cloud Storage
5	Function Design And Implementation Of Cloud Platform Based On Docker Container
6	Design And Implementation Of Container Migration And Operation And Maintenance Management System Based On Docker Cloud Platform
7	Research Of Cloud Computing Based On Data Storage Technology
8	Design And Implementation Of Cloud Container Management System Based On Microservice Architecture
9	Research On Access Control Technology In HDFS-Based Cloud Storage
10	Design And Implementation Of Container Cloud Platform Base On Kubernetes Technology