Research On Characteristic Gene Selection And Cancer Classification Clustering Algorithm For Gene Expression Dat

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:L X Zhang

Full Text:PDF

GTID:2554307070452844

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Malignant tumor,commonly called cancer,has become the major disease threatening people’s safe and healthy life in recent years.Because the occurrence of cancer is often accompanied by the misexpression of normal genes and gene mutations,researchers can determine whether patients have cancer by examining the different expression changes of genes in the gene expression profile.As an effective information feature of gene activity,gene expression data has become a key data set for researchers to study cancer.Because the number of gene expression data samples is usually only a few hundred,and the number of genes can reach thousands,tens of thousands or even more,the number of pathogenic genes related to cancer is very small,resulting in such data with the characteristics of typical small sample,high dimension and high redundancy.It is necessary to reduce the dimension of gene expression data through a machine learning algorithm in advance to obtain useful identification information for the subsequent task of characteristic gene selection,cancer classification and cluster analysis.Some methods based on matrix decomposition(such as PCA,LRR,etc.)have been proposed and applied to extract features from high-dimensional and highly redundant data.However,with the increase of data complexity and the shortcomings of these traditional methods,they can not obtain satisfactory results.(1)We proposed a new PCA-based method called robust Laplacian supervised discriminative sparse principal component analysis(RLSDSPCA).At present,the great majority of PCA-based methods have a limitation: most methods do not combine the improvement of robustness to outliers and noise,label information,sparsity and the capture of local geometrical structure in one objective function.To overcome this drawback,we proposed a novel PCA-based method,known as robust Laplacian supervised discriminative sparse PCA(RLSDSPCA),which enforced the L2,1 norm on the error function and incorporated the graph Laplacian manifold into supervised discriminative sparse PCA.To evaluate the efficacy of the proposed RLSDSPCA,we applied it to the characteristic gene selection and tumor classification problems on gene expression data.Computational experimental results demonstrate that the proposed RLSDSPCA achieved the best performance.(2)We proposed a new LRR-based method called block diagonal low rank representation based on Huber loss and ordinal locality(HOBLRR).At present,the graph regularization term of most graph regularization LRR only considered the local geometric structure of the original data and ignored the ordinal locality.Therefore,in this study,we proposed a new LRR-based method,called block diagonal low rank representation based on Huber loss and ordinal locality(HOBLRR).This method forced Huber loss on the error function of LRR to achieve robustness to noise and outliers.The preservation of local geometry and ordinal locality was introduced into graph regularization.In addition,the low rank representation matrix was forced to regularized the block diagonal matrix to seek block diagonal matrix directly.We applied the proposed method to simulation data clustering,characteristic gene selection and cancer sample clustering of gene expression data.The final experimental results show that HOBLRR achieved the optimal performance.(3)In order to facilitate the use of other gene expression data researchers,we developed an online webserver based on spring MVC framework to provide prediction services for cancer sample classification based on gene expression data.

Keywords/Search Tags:

Characteristic gene selection, cancer classification and clustering, local geometric structure, supervised discriminative sparse principal component analysis, Huber loss, ordinal locality, block diagonal low rank representation

PDF Full Text Request

Related items

1	Pharmaceutical Large Infusion Foreign Matter Detection Method Based On Clustering Joint Sparse Representation
2	Research On Sparse Low-rank Representation Model And Its Application In Cancer Sequencing Data
3	Research On Image Classification And Extraction For Traumatic Brain Injury Based On Sparse Representation Model
4	Histopathological Images Classification Based On Discriminative Dictionary Learning
5	Research On Low-rank Representation Methods And Their Application In Cancer Sequencing Dat
6	Classification Of Tumor Gene Expression Profiles Based On Sparse Theory
7	Research On Tumor Classification Algorithm Based On Sparse Representation
8	A Cancer Classification Method Fused From Training And Low-rank Representation Of Gene Expression Data
9	Recognition Of Non-small Cell Lung Cancer Based On Intelligent Analysis Of Multi-modal Data
10	Research On Cancer Subtypes Prediction Based On Subspace Clustering