Font Size: a A A

Research On Data Deduplication Method Of Key-value Storage System Based On LSM Tree

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z K ZhaoFull Text:PDF
GTID:2518306107450134Subject:Computer technology
Abstract/Summary:PDF Full Text Request
When facing the storage and management of massive data,the Rocks DB key-value database based on the LSM(Log-Struct Merge)tree structure has been widely used because of its flexible data model,high performance,high availability and scalability.At the same time,research shows that there is widespread data redundancy in massive data,such as sensor data collection in smart homes,network log collection and data monitoring,and many other typical massive data real-time processing scenarios.There are a large number of data stored in key-value databases redundancy.If the redundant data in the key-value storage system can be eliminated,the actual write volume of the key-value storage system can be reduced and the storage pressure can be relieved.Therefore,how to eliminate data redundancy in the key-value database has become an urgent problem to be solved.In response to the above problems,this thesis designs and implements a key-value system deduplication method based on LSM tree.It mainly includes the following aspects:(1)Value-level data deduplication using a key-value-separated storage method,the data-blocking process is not required when the key-value system performs deduplication operations,thereby saving Rocks DB resources;(2)To adopt different storage and redundancy elimination methods for different granularities of Value,set the threshold value for small value without key-value separation processing,and store it directly in the native LSM tree body to enhance the management of the data in the fingerprint comparison table and improve the fingerprint response speed of the comparison process in the optimized Rocks DB;the data deduplication operation process is used for the scenario where the key-value pairs is large;for some compressible key-value pairs,data will be compressed firstly to eliminate the redundant data before the data is stored;(3)The key-value pairs after data deduplication are stored in the physically isolated Blob File structure.For the redundant data in Blob File,this thesis optimizes the dynamic garbage collection mode and adopts random garbage collection to release the redundant data in Rocks DB.Experimental results show that,compared with the Rocks DB deduplication system that does not use the optimization method in this thesis,the larger the key value,the more the performance of the key value system is improved.When the key value changes from 1K to 16 K,the average write operation tim is reduced by 73%;the average read operation time is reduced by 45%.At the same time,the performance of the method has also been improved to a certain extent when the redundancy of data is increased.When the redundancy changes from 0% to 30%,every 10% increase in redundancy reduces the time spent on write operations by 5% to 10%.At the same time,the consumption of operating system resources has increased?...
Keywords/Search Tags:LSM-tree, Data redundancy, Data deduplication, RocksDB
PDF Full Text Request
Related items