| As the human genome sequencing,biological pattern and the DNA Microarray technology accomplish,vast amounts of gene expression data with the characteristics of high dimensionality and small sample data sets generated.As a result,how to dig the valuable information of genes to understand the essence of data,life processes,mechanism of production of disease and characterization of gene functions and their interactions has become a huge challenge in bioinformatics research.How to dig the valuable information of gene expression data with the characteristics of high-dimensional and small number samples has become a hot issue in bioinformatics research.Gene clustering,which has been proved to be an important way for realization of gene function partition,and there is necessity that chooses an efficient clustering method.Sample classification is an effective auxiliary method to achieve gene identification and disease diagnose,and the key step is how to accurately reduce data dimensionality and extract the data feature for small number and high-dimensional samples.On the basis of gene clustering and classification,this paper studies non negative matrix factorization and sparse representation respectively.Direction one: Non-negative matrix factorization acts as a new matrix decomposition and a new clustering method.The decomposition result is full of practical physical significance and obtain the local features of training samples perfectly because of the nonnegative constraints.So the practical value for the research of gene expression data using non-negative matrix factorization highlights.Gene clustering is an effective way to dig the valuable information of genes.The genes with similar functions are studied through gene expression level.From different points of view to tap the intrinsic characteristics of genes:(1)Unlike more traditional clustering methods that excessively rely on the similarity measure,non-negative matrix,a kind of effective data classification technology,doesn’t depend on the similarity function to assess gene similarity and show good results.(2)The purpose of using basic non-negative matrix factorization and K-means cluster together,is that study the internal structure information of gene quickly.The above two methods are used for gene cluster analysis of yeast data,and compared with the basic non-negative matrix decomposition,the proposed algorithm has better clustering effect.Direction two: Sparse representation,a classification technique with high accuracy for recognition and strong robustness,has drawn great attentions from scientists.But the focus of it is not extracting characters,instead,it is designing the classifier.As a result,based on sparse representation,the key of how to make the gene expression data on the classification lies in the design of classifiers.This paper discuss non-negative matrix factorization,sparse representation and do some works as follows.In the aspect of gene expression data,firstly,the basic feature of gene expression data is ’high dimensions,small samples and differences exist between these data samples’,which causes the data skew seriously.So a sparse representation method using data balance strategy is proposed.Secondly,traditional sparse representation classification ignores intrinsic nonlinear correlations of gene expression data.Therefore,using similarity distance between genes,a new sparse representation of similarity is presented.Then aiming at the slow speed problem of small number and high-dimensional samples,A fast sparse representation shows that the method reduces work time greatly without losing accuracy,only for MIT data up to 32 times,other data also increase 2 to 10 times.Next considering the typical redundancy of gene expression data,the subspace sparse representation using non-negative matrix can significantly bring up the classification effect.Compared with the traditional sparse representation,the methods improve the accuracy as well as robustness.Finally,experimental results demonstrate the proposed algorithm has high accuracy compared with SRC,KSRC,CRC,MSRC,CRCp SOC,SVM and other algorithms. |