Font Size: a A A

Research And Implementation Of Time Series Classification Based On Semi-supervised Learning

Posted on:2012-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:L X WuFull Text:PDF
GTID:2210330368987897Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Time series is widely employed in all areas of life, including speech recognition and financial management. The classification of time series is an important field of data mining. Traditional methods are similarity-based and model-based. These classification methods are supervised learning algorithms and need labeled time series to obtain a reliable classifier, however, it is difficult to obtain the labeled data. If only use the initial labeled for training, the accuracy rate of the obtained classifier is very low. But unlabeled time series is easy to obtain, therefore, combining with labeled and unlabeled data information to train classifier named as semi-supervised methods becomes the focus of the research.This paper focuses on the semi-supervised learning-based classification of time series. Considering the classification accuracy of the trained HMM is very low under the condition of a small amount of labeled time series, we discuss how to use the self-training iterative learning process to enlarge the labeled time series dataset, and train the HMM on the enlarged labeled dataset to get more accurate and reliable model. Moreover, we discuss ho w to use the co-training iterative learning process to enlarge the labeled dataset. In the co-training, HMM and nearest neighbor classification are used as two base classifiers. In each iteration, HMM and one nearest neighbor respectively select some unlabeled data to label. Because there are incorrect labeled data, the edit method based on rough set is introduced. Linear neighborhood propagation is also improved by using the clustering result of K-means based on rough set which makes the constructed neighbor graph more reasonable.Experimental results on the UCR time series dataset show that the accuracy is improved by using self-training and co-training. Taking synthetic control for example, when the number of labeled data of each category is 4, the accuracy is increased 8.11% and 15.19% respectively by using self-training and co-training. Meanwhile, improved LNP based on rough K-means clustering (K=4) increases 7.24% than LNP.
Keywords/Search Tags:Semi-Supervised learning, Hidden Markov model, Self-Training, Co-Training, Linear neighborhood propagation
PDF Full Text Request
Related items