Pattern Detection In Time Series

Posted on:2016-12-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S F Miao

Full Text:PDF

GTID:1228330461471040

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Time series is an ordered sequence of measurements. It is one of the most challenging topics in data mining, featuring with high dimensionality and dynamic nature. Representation and similarity measure are two import research directions in time series, which are often studied together. Representation helps to reduce dimensionality, suppress noise and preserve prominent features. Similarity measure is the basis for pattern matching, contributing to mining in time series.In this thesis, we look into time series in both the time domain and the frequency domain. Based on scales, we propose two time series representation methods: predefined pattern detection method and baseline correction method. The predefined pattern detection method can be used to extract matched instances from time series effectively, accounting for temporal and magnitude deformations. This method is based on the notions of templates, landmarks, constraints and trust regions, which first transfers the time series and templates into landmark sequences, and then extracts landmark subsequences that meet defined constraints, and finally models obtained landmark segments (time series subsequences) based on trust regions. The method also employs the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from over-fitting.Baseline can be viewed as a big-scale component of time series. Recognizing and correcting it helps us better understand the trend and patterns in the time series. Based on the Probability Density Function (PDF), we propose a new baseline detection method, the most-crossing method, which is a piece-wise method. Being different from some other piece-wise methods, this method treats the data points in the time series differently, classifying them into peak points and noise points. Even when the signal-to-noise ratio is high, the method still works with superior performance.All the methods mentioned above involve the problem of parameter selection, such as the size of sliding window, the threshold of slopes, the smoothing scale and the similarity threshold. By combining MDL with PDF, we enable the methods select the right parameters of the given dataset automatically, and avoid the errors caused by artificial parameters.The datasets employed in this thesis are collected with a sensor network installed on a Dutch highway bridge. The sensor network is composed of three kinds of sensors, and the datasets collected with these sensors are also different. According to sensor properties, we transfer and model time series pairs of two different types at different scales, and detect their dependencies (correlations). Finally, by combing some other sensor properties, like location and installation, we conduct a second learning on the obtained dependencies, and obtain a number of useful rules. These rules can be used to build sensor networks with low-cost and effective performance in the future.

Keywords/Search Tags:

Time Series, Representation, Similarity, Baseline, Predefined patterns, Minimum Description Length, Probability Density Function

PDF Full Text Request

Related items

1	Reserch On Rule Discovery In Time Series
2	Research And Application Analysis Of Optimal Reasoning Models Based On Minimum Description Length
3	Similarity Analysis Of The Problem Of Time Series
4	The Research Of Similarity Search Based On Dynamic Time Warping In Time Series
5	Statistical Shape Modeling Based On Minimum Description Length Optimization And Segmenting In Medical Images
6	Study On Water Quality Time Series Data Mining And Application Integration
7	Research On Feature Representation And Clustering Method For Time Series
8	Research And Application Of Hydrological Time Series Similarity Pattern
9	Bayesian Minimum Description Length Techniques for Multiple Changepoint Detection
10	Study Of Linear Representation Method And Similarity Measurement Algorithm Of Time Series