Gene Expression Data Clustering Based On Non-negative Matrix Factorization And Spares Representation Classification

Posted on:2018-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Li

Full Text:PDF

GTID:2310330518499103

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the human genome sequencing,biological pattern and the DNA Microarray technology accomplish,vast amounts of gene expression data with the characteristics of high dimensionality and small sample data sets generated.As a result,how to dig the valuable information of genes to understand the essence of data,life processes,mechanism of production of disease and characterization of gene functions and their interactions has become a huge challenge in bioinformatics research.How to dig the valuable information of gene expression data with the characteristics of high-dimensional and small number samples has become a hot issue in bioinformatics research.Gene clustering,which has been proved to be an important way for realization of gene function partition,and there is necessity that chooses an efficient clustering method.Sample classification is an effective auxiliary method to achieve gene identification and disease diagnose,and the key step is how to accurately reduce data dimensionality and extract the data feature for small number and high-dimensional samples.On the basis of gene clustering and classification,this paper studies non negative matrix factorization and sparse representation respectively.Direction one: Non-negative matrix factorization acts as a new matrix decomposition and a new clustering method.The decomposition result is full of practical physical significance and obtain the local features of training samples perfectly because of the nonnegative constraints.So the practical value for the research of gene expression data using non-negative matrix factorization highlights.Gene clustering is an effective way to dig the valuable information of genes.The genes with similar functions are studied through gene expression level.From different points of view to tap the intrinsic characteristics of genes:(1)Unlike more traditional clustering methods that excessively rely on the similarity measure,non-negative matrix,a kind of effective data classification technology,doesn’t depend on the similarity function to assess gene similarity and show good results.(2)The purpose of using basic non-negative matrix factorization and K-means cluster together,is that study the internal structure information of gene quickly.The above two methods are used for gene cluster analysis of yeast data,and compared with the basic non-negative matrix decomposition,the proposed algorithm has better clustering effect.Direction two: Sparse representation,a classification technique with high accuracy for recognition and strong robustness,has drawn great attentions from scientists.But the focus of it is not extracting characters,instead,it is designing the classifier.As a result,based on sparse representation,the key of how to make the gene expression data on the classification lies in the design of classifiers.This paper discuss non-negative matrix factorization,sparse representation and do some works as follows.In the aspect of gene expression data,firstly,the basic feature of gene expression data is ’high dimensions,small samples and differences exist between these data samples’,which causes the data skew seriously.So a sparse representation method using data balance strategy is proposed.Secondly,traditional sparse representation classification ignores intrinsic nonlinear correlations of gene expression data.Therefore,using similarity distance between genes,a new sparse representation of similarity is presented.Then aiming at the slow speed problem of small number and high-dimensional samples,A fast sparse representation shows that the method reduces work time greatly without losing accuracy,only for MIT data up to 32 times,other data also increase 2 to 10 times.Next considering the typical redundancy of gene expression data,the subspace sparse representation using non-negative matrix can significantly bring up the classification effect.Compared with the traditional sparse representation,the methods improve the accuracy as well as robustness.Finally,experimental results demonstrate the proposed algorithm has high accuracy compared with SRC,KSRC,CRC,MSRC,CRCp SOC,SVM and other algorithms.

Keywords/Search Tags:

Gene Expression Data, Non-negative Matrix Factorization, Sparse Representation, High-dimensional and Small Samples

PDF Full Text Request

Related items

1	Representation Algorithm And Its Application Based On Non-negative Matrix Factorization Data
2	Non-negative Matrix Factorization Algorithm And Its Application
3	Research On Robust Double-constrained Matrix Factorization Method And Its Application In Gene Sequencing Data
4	Non-negative Matrix Factorization And Its Bioinformatic Applications
5	Research On Optimization Method Of Non-negative Latent Factor Analysis On High-dimensional And Sparse Matrix
6	Research On Integrated Non-negative Matrix Factorization Method For Network Pattern Mining Of Omics Data
7	The Research Of Sparse Non-negative Matrix Factorization
8	Nonnegative Matrix Factorization Based On Class Information And Sparse Representation
9	Research On Robust Sparse Non-negative Matrix Factorization Algorithm
10	Adaptively Non-negative Latent Factorization Of Tensors Based Dynamical Network Representation Model And Application