Research On Performance Optimization Of Solid-State Drive-Based Key-Value Separated Stores

Posted on:2024-05-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C L Tang

Full Text:PDF

GTID:1528307319962509

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In the big data era,various applications,including social media,mail service,video sites,produce massive unstructured data,which poses new challenges to storage system.The Log-Structured Merge-Tree(LSM-tree)based key-value(KV)store can access these unstructured data efficiently in real-world scenerios.Therefore,it becomes the core techonology of the storage infrastructure in modern data centers.However,the LSM-tree based KV store is designed for conventional HDD devices,which cannot fully exploit the bandwidth of these widely-used SSD devices.The compaction operations of LSM-tree not only cause large read/write amplification and deteriorate the read/write performance of host applications,but also reduce the lifetime of SSD devices significantly and increase the costs of data storage.Although KV separation can mitigate the read/write amplification in the LSM-tree,it also brings some problems:(1)it leads to poor range query performance;(2)it cannot fully exploit the bandwidth of SSD devices;(3)it cannot optimize primary and secondary indices properly.To address the above problems,we propose to investigate optimization of LSM-tree and data organization to cope with the performance issues of KV separated stores.KV separated store appends KV pairs in a log-structured manner to a separate value log area,which can maximize the write performance.However,it makes KV pairs scattered in the value log,thus deteriorates the range query performance for small KV pairs.To address this issue,we propose Fence KV,an efficient KV separated store,to optimize the range query performance while providing reasonable write performance.First,Fence KV adopts a key-range based data grouping method.The value log area is divided into multiple groups and the key range of each group is nonoverlapping.Second,Fence KV uses a GC policy based on the key range,which can mitigate the GC overhead while reorganizing KV pairs to optimize the range query performance.Evaluation results show that,when comparing to existing schemes,Fence KV can improve the range query throughput by 53%,while the write throughput only decreases by 7.6%.Primary and secondary indices share the same value in the value log area.However,this shared-value scheme cannot meet the demand of different order for both indices.As a result,it makes range queries of both indices hard to fully exploit the bandwidth of SSD devices.To address this issue,we propose RISE,a KV separated store that can accelerate range queries of primary and secondary indices simultaneously.First,we run an existing KV separation scheme to analyze the access characteristics of SSD devices,and thus observe that the range query performance can be improved when KV pairs are stored in strict sequentiality or loose sequentiality.Then RISE adopts a data grouping method based on key range to divide the value log into multiple groups,but it relaxes the internal order of KV pairs to achieve loose sequentiality for the primary index.After that,RISE uses a co-location GC policy to maintain strict sequentiality for the secondary index.Finally,RISE employs a parallel value parsing policy to accelerate the value parsing procedure during GC.Evaluation results show that RISE can improve the range query throughput by23% for the primary index and by 31% for the secondary index.Besides,RISE can accelerate the value parsing procedure during GC by 17.9%.It also has conflicts for different secondary indices in KV separated store,which makes it even harder for these secondary indices to exploit the bandwidth of SSD devices;Besides,it is inefficient to update primary and secondary indices.To address these issues,we propose a KV separated store,named Rep KV,to exploit the SSD bandwidth and boost operations of multiple indices.First,Rep KV adopts a primary-backup scheme.Each copy stores the same KV pairs,but adopts different data organization to fully exploit the bandwidth of SSD devices.Second,Rep KV uses a lightweight replication scheme to mitigate KV pairs synced to replication nodes.Third,Rep KV uses a parallel parsing policy to optimize the value parsing procedure in the write operation.Evaluation results show that Rep KV can improve the point lookup throughput of the secondary index by22.24%,and improve the range query throughput of primary and secondary indices by31.38%,and improve the performance of the write throughput by 8.63%.Besides,Rep KV can mitigate the write amplification of replication nodes by up to 3.05×.

Keywords/Search Tags:

Key-value store, Key-value separation, LSM-tree, Solid-state drive

PDF Full Text Request

Related items

1	Research On LSM-tree Key-value Store Based On High-density SSD
2	Research And Implementation Of A Plane Key-Value Store Based On Hybrid Structure
3	Research On LSM-tree Based Key-value Store On Open-channel SSD Features
4	Research On LSM-Tree Key-Value Stores Based On Address Remapping
5	Research On The Firmware Optimization In Solid State Drive
6	Research On SSD-Based LSM-Tree Key-Value Separation System
7	Research On SSD-based LSM-Tree Key-Value Storage System
8	Design And Implementation Of Solid State Drive Controller Based On SATA2.0Interface
9	Research On Write Amplification Optimization Of Key Value Store Based On Open-channel SSD
10	The Research Of ECC Based On LDPC For Applications In Solid State Disk Drive