Font Size: a A A

Large-scale Trajectory Similarity Search With Distributed Tries

Posted on:2022-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:L G WengFull Text:PDF
GTID:2480306572497334Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Trajectory data is a sequence of geographic information obtained by sampling the movement process of a object.With the popularity of GPS devices,the scale of trajectory data has grown explosively,which makes it possible to analyze trajectory data to facilitate people’s life.The k nearest neighbor query based on trajectory similarity is one of the basic operations of trajectory analysis.However,the massive trajectory data make the existing stand-alone algorithms unable to efficiently complete the query task.Distributed schemes can use the resources of multiple machines to speed up the query process,but the most advanced distributed schemes have the problems of waste of computing resources and low query efficiency of local index.In order to solve the above problems,we propose a distributed trajectory similarity search framework REPOSE that runs on Spark to process k nearest neighbor query.The framework supports a variety of trajectory similarity measures,including Hausdorff distance,Frechet distance and DTW distance.REPOSE consists of a global partitioning strategy and a local index.The global partitioning strategy is a novel heterogeneous partitioning strategy,which balances the composition of each partition to achieve load balancing and improve the query efficiency of the local index.The local index uses Z-order to discretize the original trajectory into a reference trajectory,and then uses a reference trie to index the reference trajectories.To further improve query efficiency,REPOSE uses a sequence-independent optimization method to improve the performance of the reference trie.In addition,three powerful lower bounds of pruning(one-side lower bound,two-side lower bound and pivot lower bound)are proposed based on the nature of reference trajectories to reduce the number of computations and use intermediate results to reduce the cost of calculating lower bounds.In order to fully demonstrate the effect of REPOSE,its performance was compared with that of the most advanced distributed trajectory similarity search framework on real datasets.The experimental results show that the performance of REPOSE is superior to the existing distributed trajectory similarity search framework.
Keywords/Search Tags:Trajectory, Similarity search, Top-k, Distributed, Heterogeneous
PDF Full Text Request
Related items