Clustering Methods In Data Mining With Its Applications

Posted on:2009-10-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R F Yin

Full Text:PDF

GTID:1117360272488798

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Data mining is a new technology, developing with database and artificial intelligence. It is a processing procedure of extracting credible, novel, effective and understandable patterns from databases. Owing to its tremendous business prospects, data mining has been one of the most popular research areas in database and information technology, and has received increasing attentions in the past years.Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. In addition, cluster analysis usually acts as the preprocessing of other data mining operations. Therefore, cluster analysis has become a very active research topic in data mining.As the development of data mining, a number of clustering methods have been founded. The recent studies on clustering methods in data mining come mostly from computer science area, paying more attention to clustering algorithm research. The study of clustering technique from the perspective of statistics, however, is relative scarce. Based on the statistical theories, our paper make effort to combine statistical method with the computer algorithm technique, and introduce the existing excellent statistical methods, including factor analysis, correspondence analysis, and functional data analysis, into data mining.This paper consists of six chapters, and the main contents of each chapter are outlined as follows:Chapter 1 is the introduction, which briefly introduces the research background and issues, contents and frameworks, as well as the contributions of the paper.Chapter 2 firstly carries out a review on data mining, the main clustering methods and their recent advances, then analyze systematically these methods from three different viewpoints: clustering criteria, cluster representation and algorithm framework.By improving the algorithm of classical Q-mode factor model, chapter 3 put forward a new clustering method for large-scaled database: Q-Mode Factor Clustering Method. This method is used successfully to the listed company board analysis at the last of this chapter.In chapter 4, based on the thoughts of Q-mode factor analysis and correspondence analysis, we propose Correspondence Analysis Clustering Method, a new clustering approach in data mining. After clustering the mobile communication consumption data, we realize the segmentation of mobile communication consumption market.In chapter 5, a general framework of time series clustering is established by virtual of the thoughts and techniques of functional data analysis. By extending this method to the multivariable condition, we resolve the problem of multivariable time series clustering. Finally, we apply the proposed method to portfolio risk diversification, and the validity is verified through the bootstrap simulation technique.Chapter 6 is the summary of the whole paper, including the research conclusions, limitations, and the directions of future research.The main innovations in this paper are as follows:1. By mending the classical Q-mode factor model, we put forward Q-Mode Factor Clustering Method, which dramatically reduce the time complexity of the algorithm.2. We propose a new clustering approach, Correspondence Analysis Clustering Method. The approach is effective in calculation which is an obstacle in Q-mode factor analysis. Additionally, this approach overcomes the subjectivity of traditional correspondence analysis and avoids the lost of information.3. In the procedure of Correspondence Analysis Clustering Method, we construct a standardized factor component matrix, resolve the factor score in correspondence analysis, and for the first time introduce factor rotation into correspondence analysis. All of above work expand to some extent the methodology and theory system of correspondence analysis.4. By virtual of the thoughts and techniques of functional data analysis, we establish a general framework of time series clustering, under which lots of traditional static clustering method can be applied to time series data.

Keywords/Search Tags:

Data Mining, Cluster Analysis, Statistical Method

PDF Full Text Request

Related items

1	Data Mining For Applied Research, Statistical Work
2	Research On The Classification Method Of Students' Physique In Vocational Colleges Based On Cluster Analysis
3	Study On The Application Of Data Mining In Academic Achievements Analysis
4	Analysis And Prediction Of Game Players Based On Data Mining Technology
5	Analysis In Benchmarking Based On Data Mining And Teaching Management Process Research For Local College
6	Research On The Application Of Data Mining In The Analysis Of Students' Scores
7	Predicting Students' Online Grades Based On Data Mining Technology
8	Research On The Appliacation Of Data Mining Technology In Teaching Evaluation System In Higher Vacational Colleges
9	Application Research Of Data Mining In School Sports Annual Report
10	Research And Application Of Bussiness Intelligence And Its Key Technologies In Statistical Work