Font Size: a A A

Dynamic Time Warping Oversampling Methods For Imbalanced Time Series

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2370330566993744Subject:applied economics
Abstract/Summary:PDF Full Text Request
Time series classification is widely used in the fields of motion recognition,speech recognition,anomaly detection and medical disease detection and so on.In this kind of classification problem,the problem of data imbalance often occur,existing data mining algorithms most assume a roughly balanced class distribution,the imbalance of data often leads to the unsatisfactory learning performance.Moreover,the time series data has the characteristics of high dimensionality,shift invariance,multi-scales characteristic and complex dynamics in temporal sequence,which is different from the cross-sectional data,so the existing imbalanced cross-sectional data processing methods can't be directly applied to the processing of the imbalanced time series.We proposed a novel oversampling technique for imbalanced time series under the consideration of high dimensionality,shift invariance,multi-scales characteristic and complex dynamics.Under the metric space defined by Dynamic Time Wrapping(DTW),we divided minority class cases into safe group and noisy group,reweighted sampling distribution,selected k-nearest neighbors and interpolated points for each warping path.After that,new time series were generated between the minority class and its randomly selected minority class from all minority class cases in its k-nearest neighbors according to the reweighted sampling distribution.We selected 12 typical imbalanced datasets from UCR time series repository,and applied Gaussian Process Classifier(GPC)to test the effectiveness of our method.The extensive comparison results show that:(1)The new samples generated by our oversampling method(SDTW)is even more homogeneous than that of ROS,SMOTE and BSMOTE,the boundary of the two classes is more clear,and the new data set keeps the distribution of the original data set.And only the minority samples were selected as k-nearest neighbors in the new samples generating process,which overcame the shortcomings of the k-nearest neighbors selection process of SMOTE.What's more,no new samples were generated around the minority samples in the noise set,which effectively prevents the introduction of additional noise.(2)The processing of the imbalanced time series data improves the performance of GPC,and compared with existing sampling techniques such as ADASYN,SMOTE and BSMOTE,SDTW achieves better performances on several imbalanced data classification evaluation indexes,such as accuracy,F-Value,G-Mea and AUC.The Friedman test shows that the performance of each method is different,and the SDTW is significantly better than other methods in accuracy,F-Value,G-Mea and AUC.(3)The results of empirical analysis on data sets of different imbalance rate show that SDTW achieves best performances on these imbalanced data classification evaluation indexes.This paper verifies the superiority of our method in the processing of imbalanced time series,and our method can be applied to real time series classification task.
Keywords/Search Tags:Imbalanced Time Series, Dynamic Time Warping, Oversampling, Gaussian Process Classifier
PDF Full Text Request
Related items