| Whenever data needs to be retrieved quickly in a database,indexing is the preferred measure.Traditional database indexes are designed for general purpose,such as B-Tree,R-Tree,and these index structures do not take advantage of the data distribution characteristics.Recent work by scholars has embedded machine learning models into database queries to obtain both faster query speed and reduced storage space consumption.Inspired by learning indexes,i.e.,looking at indexes as models,this paper designs two indexes for point queries,range queries and KNN queries,i.e.,Z-Index and Y-Index on multidimensional space.The core idea of Z-Index is to divide the data space into multiple rectangular partitions by performing space-based partitioning,such as grid partitioning,on the whole data space,and then using Lebesgue measures to map the points within each rectangular partition to one dimension.The values after the dimensional reduction are sorted,and then a learning model is used to efficiently predict the order of the values after the dimensional reduction.In this paper,we use a rectangular partition to train a learning model instead of sharing a model for the whole space,which can greatly reduce the complexity of a single model.In other words,the Z-Index consists of the following parts.The first part chooses a space-based partitioning method,such as grid partitioning,to divide the entire space into multiple independent sub-partitions.Second,for each sub-partition choose a dimensional reduction method,such as Z-order or Lebesgue measure(the latter is adopted in this paper)to reduce the points of the sub-partition to one dimension,and choose a sorting algorithm for sorting.Thirdly,a learning model is chosen independently for each sub-partition to predict the one-dimensional value of each partition after dimensional reduction.Finally,in order to improve the query performance,a parallel structure of Z-Index is also designed.Based on Z-Index,this paper designs algorithms for point query,rectangular range query and KNN query,where KNN query is implemented as multiple incremental rectangular range queries.Another index is Y-Index,an index based on data distribution.The core idea of Y-Index is to partition the whole data set based on data,first clustering the data into clusters one by one.Then one or more reference points are selected for each cluster,and the points within each cluster are scaled to one dimension based on the distance from the point to the reference point.Next,as in Z-Index,the one-dimensional values inside each cluster are sorted and a learning model is selected to predict the one-dimensional values after each partition,also one model is selected for each cluster.Based on Y-Index,this paper designs algorithms for point query and range query,where the range query algorithm uses triangular inequalities to exclude irrelevant clusters and improve the efficiency of the query.And this design of one model for one sub-partition makes it easier to design parallel algorithms in terms of algorithms that take advantage of the parallelism capability of multi-core operating systems.A large number of experiments prove that the Z-Index and Y-Index designed in this paper outperform the traditional R-tree and other learned indexes in terms of storage space as well as query efficiency. |