New Development in Cluster Analysis and Other Related Multivariate Analysis Methods

Posted on:2012-12-16

Degree:Ph.D

Type:Thesis

University:State University of New York at Stony Brook

Candidate:Zhang, Shaonan

Full Text:PDF

GTID:2469390011460607

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Cluster analysis is a multivariate analysis method aimed at (1) unraveling the natural groupings embedded within the data, and (2) dimension reduction. With the wide application of cluster analysis in the diversified modern research/business fields including machine learning, bioinformatics, medical image analysis, pattern recognition, market research and global climate research, many clustering algorithms have been developed to date. However, novel and/or special circumstances always call for better customized cluster analysis methods, and thus this thesis.;This thesis work consists of two parts. In the first part, we extend the modern multiple-objective cluster analysis from using a single set of features to multiple distinct sets of features by developing the novel compound clustering method and the constrained clustering method. We also developed a new statistic, the "complete linkage" R2 along with the well-known largest average silhouette, to determine the optimal number of clusters in the compound clustering. The novel compound/constrained clustering methods are illustrated through a gene microarray study with both gene expression data and gene function information.;In the second part of this thesis we propose a novel algorithm for the weighted kmeans clustering. Weighted k-means clustering is an extension of the k-means clustering in which a set of nonnegative weights are assigned to all the variables. We first derived the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We then improved the current weighted k-means clustering method (Huh and Lim 2009) by incorporating our novel algorithm to obtain global-optimal guaranteed variable weights based on the method of Lagrange multiplier and the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and well known real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy, stability and computation efficiency.

Keywords/Search Tags:

Method, Cluster analysis

PDF Full Text Request

Related items

1	Theoretical And Empirical Analysis Of The Regional Industry Cluster Competitiveness Evaluation
2	Research On IPO Pricing Method Of Biomedical Industry Based On Cluster Analysis
3	Study On The Method Of Risk Management In Development Project Cluster
4	Research And Application Of Technology Pricing Method
5	Evidence Study Of IPO Pricing In China A Share Market Based On The Method Of Cluster Analysis
6	Ouantile Regression Method Research On Influencing Factors Of Financial Performance For Chinese Manufacturing Industry Listed Corporation
7	Manufacturing Clusters In Zhejiang Province In The Evolution Of Dynamic Factors Research
8	Research For Method Of Market Segmentation Based On Self-organizing Cluster
9	A Study On Upgrade Of Logistics Industry Cluster In Hunan
10	Competitiveness Of Yunnan Province Based On The GEM Model Biomedical Industry Cluster