Research On Ceph-Oriented Data Deduplication Strategy

Posted on:2023-09-18

Degree:Master

Type:Thesis

Country:China

Candidate:K C Cai

Full Text:PDF

GTID:2568307031989249

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,people have ushered in the era of big data,with more complex data types and larger amounts of data.Big data not only brings diversified commercial value to enterprises,but also brings huge challenges to the storage industry.Therefore,cloud storage takes distributed storage technology as the core and occupys the storage market with its advantages of high efficiency and low fees.However,no matter the current mainstream cloud storage system or the traditional data storage system,it is unavoidable to store a large amount of redundant data.Especially in the archive or backup system,the redundant data can account for 70% of the total storage and greatly increases storage costs.In order to reduce hardware cost and energy consumption,it is necessary to improve the utilization of storage space by deduplicating data.However,when traditional data deduplication is applied to cloud storage,there are two key problems: one is how to balance the deduplication rate and system resource utilization while improving the overall efficiency of the system;the other is how to effectively reduce the impact of deduplication on the cloud The impact of the storage system.Based on the above problems,this thesis has done the following work.1.The existing data deduplication system excessively pursues the deduplication rate,resulting in low system efficiency and high overhead.Aiming at this problem,a data deduplication mechanism based on data similarity clustering is proposed.The mechanism uses the similarity of data to cluster,and a secondary index structure is designed and constructed according to the clustering results.In order to improve the efficiency of data retrieval and comparison,a cache replacement algorithm is designed according to the correlation characteristics between data to further improve the index cache hit rate.Compared with other deduplication mechanisms,the results show that the proposed mechanism can greatly improve the system efficiency while ensuring a high deduplication rate.2.Under the Ceph distributed storage architecture,deduplication will not only aggravate the uneven load problem of the original OSD,but also affect the read and write performance of Ceph.In response to this problem,the mechanism of work 1 is further optimized.OSD’s load rebalancing strategy,which comprehensively considers the performance of system read and write to migrate data,so that the overall load of the cluster is more balanced,and at the same time,by optimizing the read and write process of deduplication,it reduces the impact on Ceph’s read and write efficiency.Compared with the original Ceph distributed storage system,the experimental results show that the OSD load rebalancing strategy can effectively solve the Ceph load imbalance problem caused by deduplication,and improve the read and write efficiency.The research work shows that the deduplication mechanism and OSD load rebalancing strategy based on similarity clustering can not only effectively improve the deduplication efficiency and space utilization,but also greatly reduce the Ceph load imbalance,which has good practical significance.

Keywords/Search Tags:

deduplication, ceph, similarity, load balancing

PDF Full Text Request

Related items

1	Optimizing Data Placement Of MapReduce On Ceph-based Framework Under Load-balancing Constraint
2	Research And Implementation Of OpenStack Storage Optimization And Load Balancing
3	Research And Implementation Of Ceph Tiered Storage Optimization Strategy
4	Research And Optimization Of Mass Small File Access Performance Based On Ceph
5	The Research Of Load-balancing Algorithm Based On OpenStack
6	Research And Improvement Of CRUSH Algorithm In Ceph Distributed Storage System
7	Research On Efficient And Scalable Fine-grained Similar Image Deduplication Storage System
8	Research On The Optimization Of Storage Performance Of Massive Chinese Text Small Files In Ceph
9	Research On Key Technologies Of Load Balancing Based On OpenStack
10	Design And Implementation Of Web Load Balancing System