Penalized Gaussian Mixture Model-Based High-dimensional Data Clustering

Posted on:2017-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:G J Zhu

Full Text:PDF

GTID:2180330503461410

Subject:Mathematics and probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

This paper devote to the clustering of “high dimension, low sample size” data, assuming that the data are drawn from Gaussian Mixture Model with each component corresponding to a cluster, the variables are selected in clustering procedure, i.e., the variables contain important information are verified, thereafter the data are clustered based on these information variables. Based on Gaussian Mixture Model with penalty function, the clustering procedure and variable selection are explored. There three kinds of penalty function, L₁- penalty, Adaptive-L₁- penalty, Adaptive hierarchically penalty, upon the global mean are investigated, respectively, which induce the three modelsL₁-GMM, Adaptive-L₁-GMM, Adaptive-H-GMM. The Gap Statistics is used to estimate the number of clusters, and the EM algorithm for estimating the parameters（s）kp,（s）kpm,（s）ps.Whether a variable is an information variable can be determined throughkpm, and the turning parameter l is given by the modified BIC.Numerical simulated data and real gene expression data are used in the three models respectively. Three models all perform well for numerical simulated data, means that the clustering results and the result of variables selection are consistent with the original data. Whereas for Gene expression data, the performance of the three models are differently, and Adaptive-H-GMM is the best one. In Adaptive-H-GMM, 14 information variables are selected from 300 variables, which reduce the amount of computation and the complexity of model, the error rate of cluster is 4/72, which is accepted.

Keywords/Search Tags:

Gap Statistics, BIC, variable selection, Adaptive-H-GMM

PDF Full Text Request

Related items

1	Comparison And Analysis Of Variable Selection Methods In Classical Statistics And Machine Learning
2	Penalized Gaussian Mixture Model-Based High-dimensional Data Clustering
3	Study On Variable Selection In Balanced Longitudinal Model
4	Adaptive Variable Selection For Multiple Response Longitudinal Datay
5	Research Of Group Variable Selection Based On Adaptive Elastic Net With Strongly Correlated Data
6	Comparison Of Several Methods For Generating Directed Acyclic Graph By Variable Selection
7	Likelihood Adaptive Punishment Variable Selection Method Research
8	Variable Selection In Single-index Models Via Adaptive LASSO
9	Research On The Advantages And Disadvantages Of Lasso And Its Improved Methods In Variable Selection
10	Study Of Dna Microarray Data Of Variable Selection Methods