Font Size: a A A

Research On Subsequence Similarity Retrieval Technology For Time Serie

Posted on:2023-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:M R ZhangFull Text:PDF
GTID:2530307055451174Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Similarity retrieval in time series is a common operation in large-scale data-driven applications,which is also the main subroutine of the time series data mining algorithm.The representation of time series and complex similarity measures are the basis of time series similarity research and play a vital role in completing the task of time series similarity search.The similarity retrieval of time series subsequences is at the core of data mining technology,and its research is becoming more and more extensive in different application fields(including neuroscience,finance,meteorology,human health detection,data retrieval,etc.).However,due to the magnanimity and high-dimensionality of data sequences,the difficulty of data mining is obviously improved.Therefore,after obtaining the time series,domain experts are faced with the problems of data analysis and processing.The common method is to represent the feature of the data sequence to effectively reduce the dimensionality of the data,and then use the distance metric for similarity discrimination.Therefore,the similarity retrieval of time series sub-sequences is realized by using a combination of time series representation and similarity measurement.In this article,aiming at the problems of large dimensionality and complex calculation of massive database sequences,two new indexing techniques are proposed for similarity retrieval of time series subsequences.The details are as follows:(1)An index construction method similar to B+ tree is proposed.Firstly,the input time series are regularized and dimensionally reduced by Piecewise Aggregate Approximation(PAA),and then the results of PAA dimensionality reduction are discretized.Furthermore,the index tree is constructed according to the discretized sequence.Finally,the similarity query of variable length subsequences is performed on the constructed index tree.(2)A new hash mapping function is proposed,which is a hash algorithm based on data independent hash.The core idea is to hash the points in the data set,so that the probability of collision of points close to the distance is far greater than that far away.During the query,the query points are hashed into the bucket according to the same hash function,and then all points in the bucket are taken as the candidate approximate nearest neighbor points,and finally,the distance between the query point and each candidate approximate nearest neighbor point is calculated to judge whether it meets the query conditions.A large number of experiments show that the proposed indexing algorithm can make the similarity retrieval of variable-length subsequences more concise and efficient.
Keywords/Search Tags:Time series, Sub-sequence query, Sequence representation, Similarity measure, Similarity search
PDF Full Text Request
Related items