Font Size: a A A

Unsupervised Discriminant Analysis For Single Cell Transcriptomes Data And Its Application

Posted on:2023-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q R PengFull Text:PDF
GTID:2530306842970199Subject:Engineering
Abstract/Summary:PDF Full Text Request
Single cell RNA sequencing(scRNA-seq)technologies enabled the measurement of expression at individual cell level,providing a data basis for studying complex diseases and life activities at single-cell resolution.However,the data biases due to the technical limitations,such as high noise and sparsity,have posed huge computational challenges for the development of analytical methods.In order to overcome scRNA-seq data biases,this study proposes scDA,a selfrepresentation learning model embedded in feature extraction for cell type recognition and annotation by leveraging the interdependency between extraction of molecular features and learning of sample relationships.This method uses dimensionality reduction technique and sample self-representation to unify the two different tasks of feature extraction and sample relationship learning into a single mathematical model.Thus,scDA accurately learns the cell-cell representation matrix and the corresponding metagene discrimination matrix,which can be used for the research tasks such single cell clustering and annotation.To validate the effectiveness of the proposed method,we performed two types of benchmark studies,namely small scale and large scale.On the small-scale datasets,scDA achieved significantly improved clustering accuracy compared with other methods,and the ability of the corresponding discriminant matrix to distinguish diverse cell types is analyzed and discussed.Then,the scDA method was applied to large-scale datasets for cell type annotation.We verified that scDA can accurately label a large number of cells by training the model with a small number of cells even without the prior guide of cell annotations provided by data authors,thus indicating scDA of strong applicability to large-scale datasets.Finally,we applied scDA to scRNA-seq datasets of different platforms or sources,for example,the human pancreatic scRNA-seq dataset with obvious batch effect and the human bone marrow scRNA-seq data from different subjects.The results showed that scDA could accurately distinguish six cell types that differ in cellular abundance across multiple pancreatic datasets.While on bone marrow scRNA-seq dataset,the discriminant matrix learned by scDA could help illustrate the differentiation structure between four cell lineages specified in the dataset,which could be further confirmed by the high expression of known marker genes,thus revealing the discriminant metagenes of well biological interpretations.Therefore,on the basis of applications in medicine research,the scDA method can overcome batch effects between sequencing protocols or sources,providing strong support for cell type recognition and visualization studies.In summary,this study proposes the scDA method and the scDA centered single cell data analytical pipeline for cell clustering and cell type annotation.We evaluated its performance on the two tasks using series of small scale and large-scale benchmark datasets.We applied scDA to cross platform and source single-cell datasets,and demonstrated that the scDA method can overcome the influence of confounding or batch factor in real-world research and provide accurate cell type recognition,visualization and interpretability.It is further proved that scDA has strong practical application value.
Keywords/Search Tags:single cell transcriptome, subspace clustering, discriminant analysis, cell representation, discriminative genes
PDF Full Text Request
Related items