Research On Subsequence Similarity Retrieval Technology For Time Serie

Posted on:2023-12-21

Degree:Master

Type:Thesis

Country:China

Candidate:M R Zhang

Full Text:PDF

GTID:2530307055451174

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Similarity retrieval in time series is a common operation in large-scale data-driven applications,which is also the main subroutine of the time series data mining algorithm.The representation of time series and complex similarity measures are the basis of time series similarity research and play a vital role in completing the task of time series similarity search.The similarity retrieval of time series subsequences is at the core of data mining technology,and its research is becoming more and more extensive in different application fields(including neuroscience,finance,meteorology,human health detection,data retrieval,etc.).However,due to the magnanimity and high-dimensionality of data sequences,the difficulty of data mining is obviously improved.Therefore,after obtaining the time series,domain experts are faced with the problems of data analysis and processing.The common method is to represent the feature of the data sequence to effectively reduce the dimensionality of the data,and then use the distance metric for similarity discrimination.Therefore,the similarity retrieval of time series sub-sequences is realized by using a combination of time series representation and similarity measurement.In this article,aiming at the problems of large dimensionality and complex calculation of massive database sequences,two new indexing techniques are proposed for similarity retrieval of time series subsequences.The details are as follows:(1)An index construction method similar to B+ tree is proposed.Firstly,the input time series are regularized and dimensionally reduced by Piecewise Aggregate Approximation(PAA),and then the results of PAA dimensionality reduction are discretized.Furthermore,the index tree is constructed according to the discretized sequence.Finally,the similarity query of variable length subsequences is performed on the constructed index tree.(2)A new hash mapping function is proposed,which is a hash algorithm based on data independent hash.The core idea is to hash the points in the data set,so that the probability of collision of points close to the distance is far greater than that far away.During the query,the query points are hashed into the bucket according to the same hash function,and then all points in the bucket are taken as the candidate approximate nearest neighbor points,and finally,the distance between the query point and each candidate approximate nearest neighbor point is calculated to judge whether it meets the query conditions.A large number of experiments show that the proposed indexing algorithm can make the similarity retrieval of variable-length subsequences more concise and efficient.

Keywords/Search Tags:

Time series, Sub-sequence query, Sequence representation, Similarity measure, Similarity search

PDF Full Text Request

Related items

1	Time Series Prediction Based On Similarity Search And Its Application
2	Research On Feature Representation And Similarity Measurement Method Of Time Series
3	Research On Similarity Comparison Algorithm For DNA Sequence
4	Research On Time Series Classification Algorithm And Application Based On Shape Features
5	Research On Similarity Analysis Method For Gene Data
6	Research On Task Scheduling And Resource Provision For Saas Applications
7	A Novel Similarity Measure Model For Multivariate Time Series Based On LMNN And DTW
8	GML Temporal Clustering And Temporal Sequence Similarity Search Key Research Questions
9	Research And Application Of Hydrological Time Series Similarity Model
10	Evolutionary Tree Algorithm Based On Similarity Analysis Of Dna Sequence 4d Study