Font Size: a A A

Research On Dimensionality Reduction Algorithm Of ScRNA-seq Data Based On Generative Adversarial Networks And Autoencoder

Posted on:2022-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:P J WangFull Text:PDF
GTID:2480306329974089Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Single-cell transcriptome sequencing(scRNA-seq)is a new technology that has emerged in recent years and has brought an upsurge.It performs high-throughput sequencing of messengerRNA(mRNA)at the single cell level.The principle is to obtain a small amount of transcriptome mRNA in a single cell.These single cells are isolated from multicellular organisms,and then apply high-efficiency amplification technology to the transcriptome mRNA,then perform high-throughput sequencing on them after processing.Single-cell sequencing can sequence the genome orRNA of a single cell and provide theRNA expression profile of each cell separately.Single-cellRNA sequencing can find rare cells in heterogeneous cell populations.Tumor cell heterogeneity may be produced due to the accumulated mutations in genetic material,but the same tumor cells may also have differences in gene and protein expression levels in the same environment.Single-cellRNA sequencing brings great convenience to the study of tumor cell heterogeneity and the development of tumor drug resistance.Single-cellRNA-seq can find rare cells,which greatly facilitates immunology,genetics,and oncology research.Dimensionality reduction is an important step before clustering scRNA-seq data,and it is an important strategy for processing scRNA-seq data.The amount of scRNAseq data is large,and each cell has many attributes,the redundant and invalid features in the cell can be effectively eliminated and the cells can be visualized in a twodimensional plane by reducing the dimensionality.Dimensionality reduction is a method of preprocessing high-dimensional data,mapping the data in the original highdimensional space to the low-dimensional space,while maximizing the preservation of the key attributes of the original data.The purpose of dimensionality reduction is to retain the most important and main features of high-dimensional data,and to remove noise and unimportant features during the mapping process.The main research contents of this article are as follows:Research on dimensionality reduction algorithm of scRNA-seq data based on generative adversarial networks and autoencoderFirst,this paper introduces the background and significance of scRNA-seq,and the current research status in the field of data analysis in single-cell sequencing is also described,the current problems and challenges that need to be solved for data analysis of single-cell sequencing data are analyzed.Next,this paper introduces the background of next-generation sequencing data and the background of second-generation sequencing dimensionality reduction.Introducing the origin,basic principles and the current development status of the framework generative adversarial network GAN(Generative Adversarial Networks)and the autoencoder AE(Auto Encoder)used in this article's model.Then it analyzes the current classic and excellent algorithms in the field of single-cell sequencing data processing in depth.It introduces the clustering algorithm K-means for processing scRNA-seq data,and the deep clustering algorithm DEC that combines dimensionality reduction and clustering in detail.The t-SNE(t-distributed random neighbor embedding),cluster GAN(cluster generative adversarial network),DRA(dimension reduction model of adversarial variational autoencoder)are introduced in detail.Then,this paper proposes a new scRNA-seq data algorithm model which is based on generative adversarial networks and autoencoder,and named the model GAAE(Generative Adversarial Autoencoder Networks).The innovations of GAAE are:1.Compared with the conventional dimensionality reduction algorithm for single-cellRNA sequencing data,GAAE uses a combination of the loss function based on the zero expansion negative binomial ZINB autoencoder model and the MSE loss function to replace the traditional average The square error(MSE)loss function can better denoise the scRNA-seq data.2.Apply the GAN neural network used in the field of image processing to the field of biological information single-cellRNA sequencing data processing.3.Combine the principle of variational autoencoder into GAAE,aiming to extract data features when performing dimensionality reduction,and retain the global and local features of the original data.4.Input the data generated by the generator part of the generative countermeasure network to the Encoder to fit the data after the noise reduction of the autoencoder,and train the fake data generated by the generative countermeasure network with the real sequencing data to improve the model's ability to extract potential features.5.After the model is pre-trained,use the k-means algorithm to cluster,get the initialized latent layer dimensionality reduction data and initialize the clustering center,draw on the idea of k-means,and use the new loss function trains the Encoder part separately,so that the potential layer after dimensionality reduction can better realize the aggregation of cell groups class.Finally,this paper processes the selected scRNA-seq data and compared the GAAE model with the current classic and excellent algorithms for scRNA-seq data processing,which proved the accuracy and efficiency of the GAAE model.Experiments have proved that the combination of GAAE and k-means is very suitable for unsupervised learning tasks of scRNA-seq data.The results of scRNA-seq data provided by GAAE+k-means are comparable to the deep clustering algorithms for scRNA-seq data proposed in recent years,and even have better results on some scRNAseq data.
Keywords/Search Tags:single cell, dimensionality reduction, clustering, generative adversarial networks, autoencoder
PDF Full Text Request
Related items