Research On Learning-based Hybrid Index

Posted on:2024-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Ding

Full Text:PDF

GTID:2568307073968449

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Learned index is a novel index structure that uses machine learning methods to optimize or even replace traditional indexes.It uses a trained model to predict the position of the key directly without the costly root-to-leaf traversing.However,the performance of existing learned indexes relies on the model’s prediction accuracy.To address this problem,data-aware learned indexes set a threshold to ensure that the prediction error is below the threshold.However,these indexes introduced additional search and space costs because they used a bottom-up strategy to build a tree structure to maintain an error threshold at each level.Besides,static learned indexes cannot handle insert operations as inserts change the existing data distribution and cause the error threshold saved by models to be invalid.Although the state-of-the-art static learned index uses a dynamic index structure to support inserts,it has serious read amplification,and the internal learned indexes need to be retrained periodically.Obviously,this dynamic index structure is not an ideal general framework for static learned indexes.To address the problems of the data-aware learned index,we propose a new learned index called Hybrid Learned Index(HLI).This index effectively improves lookup performance by fixing the height and reducing the number of error corrections inside the index.Meanwhile,this index maintains a high prediction accuracy as it retains the original underlying structure of the data-aware learned index.Moreover,since the inner nodes are models without error bounds,HLI can fit the data distribution at a coarse-grained level with less models and thus achieve similar lookup performance as the state-of-the-art data-aware learned index with lower space occupation.The results show that HLI outperforms PGM-index in lookup performance on four datasets by 1.5× on average.In addition,compared to PGM-index,HLI reduces the index size by up to 13× without decaying the lookup performance.To address the problems of the dynamic index structure,we propose a hybrid index design method that supports inserts for static learned indexes,and establish a Hybrid Index Framework(HIF).Specifically,the dynamic layer is used as a buffer for inserts,and the static layer consisting of static learned indexes is used for lookups only.HIF effectively alleviates read amplification caused by inefficient traditional search methods by searching the static layer directly.And with this hierarchical structure,HIF isolates learned indexes from insert operations and avoids the retraining of the learned indexes by transformation strategy from the dynamic layer to the static layer.Moreover,we provide a self-tuning algorithm for the learned indexes that cannot be built in a single pass over the data,allowing them to be applied to dynamic workloads with low training overhead.We have conducted experiments using multiple datasets and workloads and the results show that on average,three HIF-based static learned indexes achieve up to 1.8×,1.7×,and 1.5× higher throughput than the original dynamic index structure for insert ratio below 70%.

Keywords/Search Tags:

Static learned index, Hybrid index, Data distribution, Insert operation

PDF Full Text Request

Related items

1	Research On Learned Index Based On NVM
2	Research On Learned Index For Database
3	Research On Adaptive Learned Index Supporting Efficient Writes
4	Research And Implementation On Learned Index For Navigation Domain Data
5	A Study Of Learned KD Tree Based On Neural Network
6	Research And Implementation Of Scalable Learned Index Based On String Keys
7	Research On Learned Index For Streaming Data
8	A Study Of Learned Hashing Index Based On Machine Learning
9	Research On Data Access Method Based On Learning Index
10	A Learned Dynamic Cuckoo Filter For Approximate Membership Queries Over Variable-sized Sliding Windows On Data Streams