| To meet the storage requirements of massive unstructured data,and overcome the deficiencies of traditional relational databases and file storage in scalability and performance,KV store provides a good solution(simple data model and easy to expand)and is deployed in storage facilities.However,with exponential growth in data size and high complexity of data types,KV stores face problems in different layers:First,in the storage engine layer,the LSM-tree has serious read and write amplification,especially in large-scale storage.Secondly,in the data fault-tolerance layer,the unified replication management scheme will further aggravate the read and write amplification.Finally,in the application’s access layer,applications often only need to access certain attribute values of the data,but existing KV stores cannot perceive the attribute characteristics of data,resulting in a large number of unnecessary disk I/Os.The above problems have severely limited the access efficiency and scalability of KV stores.Therefore,how to optimize the design from the storage engine layer,data fault-tolerance layer,and application’s access layer is the key to building a high-performance and high-reliability KV store.This paper studies the hybrid indexing mechanism of KV stores in the storage engine layer;the replica management mechanism in the data fault-tolerance layer;the attribute-aware heterogeneous in-memory KV store in the application’s access layer.The main contents and contributions of this paper are as follows:(1)Research on Hybrid Indexing Mechanism of KV StoresKV stores usually use LSM-tree and hash table as their index structures,however,different index structures often have different performance trade-offs.For example,hash indexes can provide fast point queries,but do not support range queries and have large memory overhead,so it is only suitable for small-scale data storage;LSM-tree can provide fast writes and range queries,but multi-layered storage architecture leads to serious read and write amplification.Therefore,a single index structure cannot meet high read and write performance.On the other hand,real workloads often have access hot spots,that is,a small amount of data is frequently accessed.Based on the above observations,in order to solve the problems faced by a single index structure in KV stores,we propose a KV store UniKV based on hybrid index architecture,which unifies the key design ideas of hash index and LSM-tree in a system.First,the data is divided into two layers(hot data and cold data),and a hash index is built in memory for a small amount of hot data to speed up access;the cold data layer adopts a single-layer LSM-tree to ensure good scalability and range query.In addition,several optimization techniques are proposed,such as the efficient merge strategy with partial KV separation,the dynamic range partitioning strategy,and the parallel optimization scheme between partitions to improve the overall performance.Finally,we implemented the prototype system UniKV based on the open-source system LevelDB.The experimental results show that the overall performance of UniKV is better than that of existing KV stores.Specifically,under the workload of mixed read and write,our design UniKV can increase the throughput to 2.0-7.1×that of existing KV stores.(2)Research on Replica Management for Distributed KV StoresIn order to ensure high data reliability and provide fault tolerance,replication strategy is widely used in distributed KV stores.However,the existing replica management simply uses one LSM-tree on each node to store all replicas(primary copies and redundant copies)uniformly.Therefore,the amount of data stored in the LSM-tree under replication will increase exponentially,which further exacerbates the read and write amplification of the LSM-tree.In order to solve the above problems,we propose a high-performance distributed KV store DEPART based on replica decoupling.First,the replica is decoupled by simple hash calculation combined with a consistent hash ring,so the decoupling operation is lightweight.In addition,differentiated storage for the decoupled primary and redundant copies:For primary copy,LSM-tree is still used for storage,but it is more lightweight,which can improve the read,write and range query for primary copy;For redundant copies,we propose a two-layer log,which first batched appends all redundant copies to the global log,and then a background thread splits them into multiple local logs,which can ensure efficient writes for redundant copies.Besides,the fine-grained data management in the local log can also ensure good read performance for redundant copes.Secondly,we design an tunable ordering scheme for the two-layer log,and adjust the two-layer log to be favorable for writing or reading through a parameter,so that users can obtain the desired performance improvement by adjusting the ordering degree.Finally,a parallel recovery scheme is designed to speed up recovery operations.We implemented the prototype system DEPART based on Cassandra.The experimental results show that DEPART can improve the read and write throughput of Cassandra by 2.5×and 1.4×,respectively,and reduce the data recovery time by about half.(3)Research on Attribute-Aware Heterogeneous In-Memory KV StoresThe attribute characteristics of data are ubiquitous,and the use of data attributes for data mining and analysis has great research value.However,the existing KV stores cannot perceive the attribute characteristics of data at the storage layer,and appends all attributes to the KV pair in the form of byte strings.When the application needs to analyze and process the data attribute,it needs to first read the entire KV pair,and then parse out the specified attribute,resulting in a large number of unnecessary disk I/Os.In order to solve the above problems,we propose an attribute-aware and heterogeneous in-memory KV store SchemaKV.First,we propose a storage architecture based on DRAM/NVM heterogeneous memory,which stores all data in NVM to provide large capacity and persistence,and caches a small amount of hot data in DRAM to provide efficient access performance.Furthermore,an attribute-aware and page-based cache architecture is designed to guarantee low metadata overhead and high cache hit ratio.Secondly,an asynchronous data caching framework and a selective caching strategy are designed to reduce the impact of data caching operations on system performance,and make full use of cache space to improve performance.In addition,a cache-affinity empty slot selection strategy is designed,so that the data with adjacent memory addresses is as ordered as possible when caching data,so as to improve the hit ratio of CPU cache.Finally,a lightweight eviction strategy is designed to ensure that the cache has enough space to cache new data.We implemented the prototype system SchemaKV based on the open source Chogori-platform.The experimental results show that SchemaKV can effectively support the access of data attribute values,and can provide access performance with low latency. |