Research On Data Deduplication Method Of Key-value Storage System Based On LSM Tree

Posted on:2021-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z K Zhao

Full Text:PDF

GTID:2518306107450134

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

When facing the storage and management of massive data,the Rocks DB key-value database based on the LSM(Log-Struct Merge)tree structure has been widely used because of its flexible data model,high performance,high availability and scalability.At the same time,research shows that there is widespread data redundancy in massive data,such as sensor data collection in smart homes,network log collection and data monitoring,and many other typical massive data real-time processing scenarios.There are a large number of data stored in key-value databases redundancy.If the redundant data in the key-value storage system can be eliminated,the actual write volume of the key-value storage system can be reduced and the storage pressure can be relieved.Therefore,how to eliminate data redundancy in the key-value database has become an urgent problem to be solved.In response to the above problems,this thesis designs and implements a key-value system deduplication method based on LSM tree.It mainly includes the following aspects:(1)Value-level data deduplication using a key-value-separated storage method,the data-blocking process is not required when the key-value system performs deduplication operations,thereby saving Rocks DB resources;(2)To adopt different storage and redundancy elimination methods for different granularities of Value,set the threshold value for small value without key-value separation processing,and store it directly in the native LSM tree body to enhance the management of the data in the fingerprint comparison table and improve the fingerprint response speed of the comparison process in the optimized Rocks DB;the data deduplication operation process is used for the scenario where the key-value pairs is large;for some compressible key-value pairs,data will be compressed firstly to eliminate the redundant data before the data is stored;(3)The key-value pairs after data deduplication are stored in the physically isolated Blob File structure.For the redundant data in Blob File,this thesis optimizes the dynamic garbage collection mode and adopts random garbage collection to release the redundant data in Rocks DB.Experimental results show that,compared with the Rocks DB deduplication system that does not use the optimization method in this thesis,the larger the key value,the more the performance of the key value system is improved.When the key value changes from 1K to 16 K,the average write operation tim is reduced by 73%;the average read operation time is reduced by 45%.At the same time,the performance of the method has also been improved to a certain extent when the redundancy of data is increased.When the redundancy changes from 0% to 30%,every 10% increase in redundancy reduces the time spent on write operations by 5% to 10%.At the same time,the consumption of operating system resources has increased?...

Keywords/Search Tags:

LSM-tree, Data redundancy, Data deduplication, RocksDB

PDF Full Text Request

Related items

1	Research On Duplicate Data Detection In Data Deduplication
2	Research On High Performance Redundancy Elimination Techniques For Data Backup Systems
3	Research Of Data Deduplication In Data Disaster Tolerance Systems
4	Research On Data Deduplication Technology In Network Storage System
5	Research On Key Technologies Of Data Deduplication For Backup System
6	Study On Data Deduplication Technique For Data Backup Systems
7	HTDRDedu:The Design And Implementation Of A Distributed Backup Data Deduplication System
8	Research And Implementation Of Verifiable Database Based On Rocksdb
9	Design And Implementation Of Distributed Data Deduplication System Based On Chord Protocal
10	The Design And Implementation Of Data Deduplication With Garbage Data Removal Policy