Font Size: a A A

Application Of Denoising Autoencoder Combined With Gaussian Mixture Clustering In Cancer Subtype Classification Of Multi-omics Data

Posted on:2024-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2544307148481474Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:The identification of molecular subtypes of cancer is of great significance for the follow-up treatment of patients,and the aggregation of multiple omics data can obtain more useful information than a single omics data.Many studies have integrated multi-omics data to identify cancer subtypes.Some studies use autoencoders to capture low-dimensional characteristics of multi-omics data.However,it is difficult to obtain satisfactory generalization performance by learning in the deep structure of autoencoder.At the same time,there are few methods to consistently identify subtypes with survival differences across multiple cancers.Based on the above problems,this study constructed a deep learning framework to extract important information from multi-omics data by denoising autoencoder,and identify cancer subtypes robuously by Gaussian mixture clustering model,thus guiding clinicians to provide personalized treatment plans for patients.Methods:In this study,six types of cancer and three types of omics data from the TCGA database were selected,each type of omics data was screened by single-factor Cox and then dimensionality was reduced by unsupervised learning using denoising autoencoder,and subtype identification of cancer was realized by Gaussian mixture clustering model.In this study,the clustering effect of Gaussian mixture model constructed by five dimensionality reduction methods,namely principal component analysis,independent component analysis,autoencoder,denoising autoencoder and single-factor Cox screening without unsupervised learning,was compared,survival curves of different subtypes were drawn,and the clustering performance difference between single omics data and multi-omics data was verified.The difference analysis and WGCNA were used to obtain the omics characteristics significantly associated with the cluster subtypes of lung adenocarcinoma,and the elastic network prediction model was established to screen out the biomarkers of the two subtypes,which were verified in GEO data set.Results:In order to construct the optimal denoising autoencoder model,the values of L1 and L2 regularization parameters of the six cancers are as follows: Liver cancer(0.001,0.0001),low-grade glioma(0.001,0.001),lung squamous cell carcinoma(0.0001,0.001),colon cancer(0.0001,0.001),ovarian cancer(0.001,0.001),lung adenocarcinoma(0.0001,0.001).Compared with the single omics data,the four evaluation indexes of Gaussian mixture clustering have better scores after dimensionality reduction of multi-omics data by denoising autoencoder.Compared with principal component analysis,independent component analysis,autoencoder and single-factor Cox screening only,the Gaussian mixture clustering model constructed by denoising autoencoder after dimensionality reduction of multi-omics data was the best in three kinds of cancer(liver cancer,low-grade glioma,colon cancer).In the other three cancers(lung squamous cell carcinoma,ovarian cancer and lung adenocarcinoma),there were three best evaluation indexes,and the survival curves of different subtypes of all cancers had statistical differences(P<0.05).In this study,difference analysis was conducted on the expression data of the two subtypes identified in lung adenocarcinoma,and 228 differences were screened out.WGCAN was conducted on the expression data,and a total of 12 modules were obtained,and turquoise module was identified as the core module of this study.Fifty-five core features related to lung adenocarcinoma subtypes were obtained by intersection of all the omics features of turquoise module and difference features,and 46 features shared by TCGA and GSE68465 data sets were used to construct an elastic network prediction model for lung adenocarcinoma subtypes.The elastic network screened 28 biomarkers related to lung adenocarcinoma subtypes from 46 omics features,and based on the prediction results of the elastic network,the survival curves between the two data sets of subtypes were statistically different(P<0.05).Conclusion:In this study,a joint model for cancer subtype recognition was constructed by means of denoising autoencoder and Gaussian mixture clustering.The results showed that the multi-omics data contained more clustering information than the single omics data,and the clustering effect was better than the single omics data.The Gaussian mixture clustering model constructed after dimensionality reduction of multi-omics data by denoising autoencoder has better clustering performance than the three unsupervised learning dimensionality reduction methods of principal component analysis,independent component analysis and autoencoder,and can obtain cancer subtypes with survival differences.Meanwhile,the dimensionality reduction with denoising autoencoder has significantly improved the clustering effect of dimensionality reduction without this model.Based on the results of denoising autoencoder combined with Gaussian mixture clustering,28 biomarkers associated with the bicluster subtype of lung adenocarcinoma were obtained in this study and verified in the GSE68465 dataset.In conclusion,the denoising autoencoder combined with Gaussian mixture model has excellent clustering performance,which can provide guidance in clinical practice and provide suggestions for personalized treatment of patients.
Keywords/Search Tags:Autoencoder, Gaussian Mixture Custering, Multi-omics Data, Cancer, Biomarker
PDF Full Text Request
Related items