| Learned index is a new indexing paradigm,which aims to reduce space cost and improve query efficiency by using machine learning models to learn the data distribution on storage media.Most of the current learned indexes are entirely or partly based on the strong assumption that data is one-dimensional,static,and stored in the memory array in ascending order.However,in the era of big data,especially in the application scenarios represented by streaming data,the above learned indexes are difficult to play a practical role.Finally,the multi-dimensional and dynamic nature of streaming data is contrary to the strong assumption that existing work exists.In view of this problem,this thesis extends the application scope of learned index to streaming data scenario for the first time,the specific work of this thesis is as follows:(1)This thesis proposes a single-dimensional dynamic learned index framework that supports frequent data updates.The framework consists of three parts:storage structure,learning model and feedback mechanism.Packed memory array-based storage structure provides support for data insertion with theoretical performance guaranteed.A segmentbased learning model group is proposed to fit the mapping from data keys to offset positions in data segments.The feedback mechanism reflects the data updates on the storage structure into the learning model’s parameters in time.Experimental results show that this framework can improve the query efficiency and reduce the time cost and computing resource cost of parameter retraining at the cost of only a little space cost.(2)This thesis improves the projection strategy of learned index in the face of multidimensional data,and designs the corresponding query algorithms.There is no natural order of multi-dimensional data,so learned indexes need to map data to single-dimensional space through projection strategy when processing multi-dimensional queries.In order to solve the distribution skew problem of streaming data after mapping,this thesis proposes corresponding improvements to the projection strategy for distribution skew caused by distribution drift and dimension correlation.Based on dynamic learned index framework and projection strategy,this thesis proposes range query and knn query algorithms for multi-dimensional spatial data.Experimental results show that the proposed method can provide higher query efficiency compared with R+tree and KD tree. |