Font Size: a A A

Long Time Series Clustering And Its Applications On Stock Price

Posted on:2012-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H SunFull Text:PDF
GTID:1109330467467556Subject:Information Science
Abstract/Summary:PDF Full Text Request
As China’s market economy and the rapid development of market economy continues to improve, people’s sense of financial awareness and investment in growing, more and more investors will look to invest in the stock. Investors seek to maximize investment returns and minimize investment risks, so understanding the stock market, analysis of the stock in the investment process occupied a pivotal position. In recent years, the stock market continued expansion, a sharp increase in the number of listed companies. Faced with thousands of different stocks, it is impossible to analyze each stock for investors. We can choose the right investment portfolio only based on classification of the stocks. Clustering analysis is an effective way to guide investments in securities. The use of clustering analysis can reveal a group of similar stocks, and to help investors to accurately understand and grasp the general characteristics of the stock and trends, to determine the scope of investment, choose a favorable investment opportunity.The data generated by the system of stock market is divided into two categories: stock market data and customer transaction data. Among them, the stock market data generated in the process of the transaction, including the opening price, closing price, trading volume and so on. The fluctuation of stock price is the most important thing that stock analysts and investors concerned about. The fluctuation of stock prices implies changes in the stock’s long-term trend. Stock price is a set of chronological data, known as time series. This data is a record set of stock market that is variable in a time period, there is some continuity, regularity and relation generally to be found. In the domain of data mining, mass of data and higher dimension constitute the feature of time series data of stock price. At the time, periodicity, randomization and tendency are the characteristics of time series data, too.With the development of data mining and pattern recognition and other computer technologies, there has been emerged data mining technology based on massive database, which can help us to find important information and knowledge fom time series of mass data, and thus make the right decisions. Clustering aigorithm is an important tool of knowledge discovery in data mining. Clustering is an method to classify a group of physical or abstract objects by similarity of all kinds, also known as "unsupervised classification." It is a commonly used data analysis tools, the purpose of it is to divide target data into different groups, which makes the data within each group as similar as possible, and different groups of data with significant differences. Currently, the stock time series have periodic, random, trends and other characteristics, the researchers proposed a number of clustering methods for time series, which can be divided into three categories:approach based on the original data; model-based method; approach based on feature extraction. These methods are proposed for time series of a single stock, and to cluster each sub-sequence of a time series. First, to segement the time series into many sub-sequences, and then to cluster sub-sequence.The clustering object in this article is a set of time series data, each time series is as a data (a whole object) and can not be separated. Stock price time series are often very long, called as long time series, number of data points of each series can reach the thousand, even million. Stock price time series has its own characteristics, which change according to certain rules, so if the data points is used to describe each time series, which often do not directly reflect these characteristics, and clustering is also difficult for them.Based on previous studies, this article extracted largest Lyapunov exponent, total power spectrum, several time domain characteristics (peak-to-peak value sum of squares, peak value, variance, kurtosis, skewness), a trend coefficient, period item coefficient, the autocorrelation coefficient, the partial correlation coefficient, and so on eleven whole sequence features based on the characteristics of the stock price time series. We use this whole sequence features to redescription stock long time series which is constituted by the stock closing price. Then a new clustering algorithm-CURBSC is proposed, and we use the algorithm to cluster the closing stock price time series which are redescripted by whole sequence features.This paper is divided into6parts, and the main contents is as following.1. Summarization of relevant theory and LiteratureThis chapter discussed elementary theory and methods that this article involved. The stock parsing technique may divide into two kinds:Qualitative analysis and quantitative analysis, which is also called the basic analysis and the technical analysis. Time series clustering analysis technology may be classified as the technical analysis category, the technical analysis foundation is the Stock market undulatory property theory. In this chapter, we first discussed effective market theory and the fractal market theory, then analyzed and compared the concept, classification and the major technique of time series clustering analysis technology. And we proposed a more effective clustering method for long time series.2. Redescription of Long time seriesBecause the long time series’data is huge and its dimension is so greatly high, so it is very difficult to define the unified similar measure formula regarding the different domain’s time series, therefore if we directly apply ordinary time series minning tool on the primitive long time series its efficiency will be low. The primary question of long time series clustering technology is to redescribe it. In this chapter we summarized the method function and the selection principles of the redescription of the time series, pointed out that the extraction of characteristics of time series is a quite good method of describing the original time series. It not only might retain the primitive time series the information, moreover might reduce the clustering computational complexity, raised the long time series data mining efficiency.3. Stock long time series pretreatment--denoising processingThe time series are composed by the low frequency trendy ingredient, the periodic ingredient and the high frequency slight fluctuation, these fluctuations are the noises. Because of stock price index code’s unfairness, organization wealthy and powerful family’s behaviors, as well as many external factors create stock market’s intense fluctuation, causes the stock price (stock index) to display is the high noise. The stochastic noise is harmful to the forecast of stock market obviously, therefore we must carry on the pretreatment to the stock long time series, but the denoising is a pretreatment part. The frequently used denoising methods of stock price time series are the Fourier transformation and the wavelet transformation denoising method. In recent years the wavelet denoising has been widespreadly used because of its denoising efficiency. We applied the wavelet denoising methods. In this chapter we first analyzed the wavelet denoising’s basic principle and the method, then discussed the denoising method based on misalignment wavelet transformation threshold value law in detailed, and setuped related key parameters.4. Sequence characteristics of a long seriesTo cluster the entire sequence, we first extract the features of series, then restructucture these extracted features as a time series, and cluster the restructured series. Through this we can reduce the dimensionality of series, on the other hand we can establish the sample attribute variable corresponding relationships, may apply the general cluster method to achieve the time series clustering goal. This chapter we first summarized the whole sequence features of the stock long time series, then we discussed whole sequence features’ extraction method of long time series based on the wavelet analysis. Finally, based on the extraction method, we used Matlab programming to extract the whole sequence features of a stock’s closing price time series.5. A new hybrid algorithm for clusteringThe new hybrid algorithm is an improved CURE clustering algorithm. In this chapter we first described the CURE algorithm and analyzed its advantages and disadvantages. Then, we propsed a new hybrid clustering algorithm based on discovering the defects of CURE algorithm, the new hybrid algorithm is named CURBSC (Clustering Using Repesentative Based on Subtractive Clustering), and the flow chart of the algorithm is put forward. In order to verify the feasibility of CURBSC algorithm, we used the data from three databases of UCI to process clustering simulation test for CURE and CURBSC respectively, and analyzed the effectiveness and time complexity of clustering results.6. Case study on stock long time series clusteringBased on the research of previous chapter, we processed and clustered forty stock long time series consisted of closing stock price downloaded from Qilu Securities. The process includes the original time series denoising, the extraction of whole sequence features, normalization processing, and clustering and so on. Finally, the proposed new clustering algorithm was evaluated, including the advantages and disadvantages of the method and future research directions.The major contributions of our works can be summarized as following:(1) The representation methods of time series is deeply studied. Because stock database capacity is huge and time series is long, so we proposed to use the whole sequence features to rephrase stock time series, to carry on the compression of the magnanimous database and does not reduce the information which the data contains. This not only saves data storage space, but also can increase the speed of data processing, thereby improves the efficiency of the system;(2) We discussed the time series clustering algorithm comprehensively, compared the existing main clustering algorithm, and proposed a more effective clustering strategy for the long time series’data mining. Because CURE-based algorithm can not partition correctly, and Subtractive clustering algorithm does not require a pre-determined number of clusters to determine the number of clusters and the adaptive characteristics of class centers, so we proposed a new hybrid algorithm mixed of CURE and subtractive clustering algorithms to carry on the clustering which is represented by whole sequence features;(3) We applied the clustering algorithm on China Stock market, carried on the cluster analysis to the stock long time series, and confirmed the validity of the hybrid algorithm that we proposed through the empirical study. This study will fill in the blank of time series data mining research of long time series clustering on Domestic, and will provide theoretical basis for deepening research of data mining for financial time series and its practical applications.
Keywords/Search Tags:long time series, clustering analysis, stock price, CURE algorithm, hybrid clustering algorithm
PDF Full Text Request
Related items