| In recent years,with the rapid development of wireless communication technology and the prevalence of positioning devices,vehicle trajectory data has seen a rapid increase.To support queries over vehicle history trajectories,conventional spatiotemporal indexes usually grow in size with the amount of data,leading to poor query performance.The rapid advance of machine learning technology provides learned index to support history trajectory query.However,the skewed distribution of trajectory data is common,and the trajectory points may cluster along the time or space dimensions,degrading the distribution fitting and query processing of learned index.Therefore,a history trajectory learned index based on spatial mapping is proposed,which is constructed through two stages,trajectory data ordering and model training.The former maps trajectory points onto a onedimensional space,while the latter trains the storage location model,which can fit different distributions of trajectory data and reduce the storage cost of the model.The work is as follows:(1)The muti-dimensional learned indexes require the data to be ordered,so the data is sorted based on G-Tree partitioning to map the three-dimensional trajectory points onto the one-dimensional space,thus providing storage location parameters for the model training.The first step is to use GTree algorithm to divide the trajectory pints into a series of partitions and sort them based on the encoding.In the second step,the center point of trajectory data in the partition is selected as the reference point,and the local ordering of trajectory points in the partition is determined according to the distance between trajectory points and the reference point.Then,the genetic algorithm is used to optimize the local ordering to ensure that adjacent trajectory points in the sorting are also adjacent in the spatiotemporal space.Finally,the global sort number is determined according to the partition attributes and the local sorting.(2)In order to support the query processing over history trajectories,a storage location prediction model is designed based on ensemble technique.Specifically,a learned index based on linear regression tree for historical trajectories is proposed,which consists of two stages.In the first stage,the history trajectory data is divided into a sequence of periods,and the gapped array is employed to store the data of all the periods.An ensemble model based on Ada Boost is trained with the initial period data,and the linear regression tree acts as the base model to fit the different distribution of trajectory data.In the second stage,the future period’s data is inserted into the gapped array in an incremental update fashion to support the history trajectory query.The main contributions of this paper are as follows:(1)A trajectory point sorting method based on G-Tree partitioning is proposed,which ensures the storage location of trajectory point is in accordance with the spatial-temporal proximity.(2)A linear regression tree is proposed,which can fit different trajectory data distributions and reduce the model size.(3)An ensemble model based on Ada Boost is proposed to improve the accuracy of the prediction model while ensuring the generalization and reducing the number of model training iterations. |