Font Size: a A A

Research On Spectral Clustering Methods And Their Applications In Financial Time Series Data Mining

Posted on:2012-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y SuFull Text:PDF
GTID:1119330368985919Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data mining is one of the core techniques for business intelligence. In real-world applications, data mining technique has been widely used in financial management, cus-tomer relationship management, workflow management, risk management, and so on. It is of great use for the success of enterprises in strategic decision making, cost control, and business collaboration.Cluster analysis is one of the key components of data mining research. The process of grouping a set of objects into classes of similar objects is called clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields such as stock data analysis, market segmentation, production supervision, anomaly detection. Spectral clustering is a novel clustering method which based on the spectral graph theory. Spectral clustering has main advantages of easy implementation and can be used to cluster data with arbitrary shape. Many studies have been devoted to the research on spectral clustering. However, further study still need to be addressed for some important questions in the theory, algorithm and real application of spectral clustering. The questions including how to determine reasonable and stable cluster numbers in spectral clustering? How can we select the informative eigenvectors in spectral clustering? Do we actually compute a reasonable clustering from matrix perturbation theory point of view? What is the principle of using component analysis in dimension reduction of univariate time series? How can we make use of spectral clustering to analyze real financial time series data?This thesis thus focuses on spectral clustering methods and their application in finan-cial time series data mining as follows:(1) We propose non-unique cluster numbers determination methods based on stability in spectral clustering. For a candidate cluster number k, first we used index Ratio(k) to judge its rationality. Then, by varying the scaling parameter in the Gaussian function to judge whether the reasonable cluster number k is also a stability one. The algorithm mentioned above can determine not only reasonable but also stable cluster numbers of the given data set. (2) For choosing informative eigenvectors in spectral clustering, we propose an algo-rithm called automatic selection of informative eigenvectors in spectral clustering (ASI-ESC). ASIESC differs from previous approaches in that it can distinguish informative eigenvectors remarkably from uninformative ones, easy to be implement and more stable than existing algorithms.(3) Using matrix perturbation theory to analyze the spectral clustering matrix used in the multiway normalized cut spectral clustering method. The results show that, multiway normalized cut spectral clustering method is reasonable from matrix perturbation theory point of view.(4) We analyze the principle of dimension reduction for univariate time series via principal component analysis from the linear algebraic point of view. Based on theoretical analysis, we propose univariate time series spectral clustering method based on principal component analysis. The main idea is that, similarities among the univariate time series can be reflected by similarities among the corresponding coefficients under the same basic vectors of linear space.(5) We discuss the principle of making use of independent component analysis (ICA) to reduce the dimension for univariate time series. Especially, we analyze the impact of ambiguity of the independent components to the clustering results. We propose a spectral clustering method based on independent component analysis for time series according to our theoretical analysis. In the algorithm mentioned above, first, we use ICA to reduce the dimension. Then estimate the cluster number via generalized eigenvalues method. At last cluster the feature data by multiway normalized cut spectral clustering method.(6) We make use of spectral clustering method to analyze comovement and stability of the global stock indices during the European sovereign debt crisis. First, we propose a cluster number determination method based on stability. Then analyze comovement and differences of adjacent time stages of the global stock indices during eight stages, including before crisis, beginning, developing, spreading, uploading, adjusting, re-uploading and recovering. At last, we analyze the distribution of the global stock indices in different time stages.(7) Study the investment styles recognition of the Chinese open-end funds by the multiway normalized cut spectral clustering method and the ICA. First, we make use of the ICA to extract features, then estimate the cluster number via the generalized eigenval-ues method, cluster the feature data by the multiway normalized cut spectral clustering method. At last, judge investment styles according to our investment styles recognition method based on Sharpe's coefficients.
Keywords/Search Tags:Data Mining, Spectral Clustering, Financial Time Series, Dimension Reduction, Component Analysis
PDF Full Text Request
Related items