Font Size: a A A

Clustering Methods In Data Mining With Its Applications

Posted on:2009-10-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:R F YinFull Text:PDF
GTID:1117360272488798Subject:Statistics
Abstract/Summary:PDF Full Text Request
Data mining is a new technology, developing with database and artificial intelligence. It is a processing procedure of extracting credible, novel, effective and understandable patterns from databases. Owing to its tremendous business prospects, data mining has been one of the most popular research areas in database and information technology, and has received increasing attentions in the past years.Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. In addition, cluster analysis usually acts as the preprocessing of other data mining operations. Therefore, cluster analysis has become a very active research topic in data mining.As the development of data mining, a number of clustering methods have been founded. The recent studies on clustering methods in data mining come mostly from computer science area, paying more attention to clustering algorithm research. The study of clustering technique from the perspective of statistics, however, is relative scarce. Based on the statistical theories, our paper make effort to combine statistical method with the computer algorithm technique, and introduce the existing excellent statistical methods, including factor analysis, correspondence analysis, and functional data analysis, into data mining.This paper consists of six chapters, and the main contents of each chapter are outlined as follows:Chapter 1 is the introduction, which briefly introduces the research background and issues, contents and frameworks, as well as the contributions of the paper.Chapter 2 firstly carries out a review on data mining, the main clustering methods and their recent advances, then analyze systematically these methods from three different viewpoints: clustering criteria, cluster representation and algorithm framework.By improving the algorithm of classical Q-mode factor model, chapter 3 put forward a new clustering method for large-scaled database: Q-Mode Factor Clustering Method. This method is used successfully to the listed company board analysis at the last of this chapter.In chapter 4, based on the thoughts of Q-mode factor analysis and correspondence analysis, we propose Correspondence Analysis Clustering Method, a new clustering approach in data mining. After clustering the mobile communication consumption data, we realize the segmentation of mobile communication consumption market.In chapter 5, a general framework of time series clustering is established by virtual of the thoughts and techniques of functional data analysis. By extending this method to the multivariable condition, we resolve the problem of multivariable time series clustering. Finally, we apply the proposed method to portfolio risk diversification, and the validity is verified through the bootstrap simulation technique.Chapter 6 is the summary of the whole paper, including the research conclusions, limitations, and the directions of future research.The main innovations in this paper are as follows:1. By mending the classical Q-mode factor model, we put forward Q-Mode Factor Clustering Method, which dramatically reduce the time complexity of the algorithm.2. We propose a new clustering approach, Correspondence Analysis Clustering Method. The approach is effective in calculation which is an obstacle in Q-mode factor analysis. Additionally, this approach overcomes the subjectivity of traditional correspondence analysis and avoids the lost of information.3. In the procedure of Correspondence Analysis Clustering Method, we construct a standardized factor component matrix, resolve the factor score in correspondence analysis, and for the first time introduce factor rotation into correspondence analysis. All of above work expand to some extent the methodology and theory system of correspondence analysis.4. By virtual of the thoughts and techniques of functional data analysis, we establish a general framework of time series clustering, under which lots of traditional static clustering method can be applied to time series data.
Keywords/Search Tags:Data Mining, Cluster Analysis, Statistical Method
PDF Full Text Request
Related items