Font Size: a A A

Research On Cell Type Identification Methods For Single-cell Sequencing Dat

Posted on:2024-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:D J ZhangFull Text:PDF
GTID:2530306923488714Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing(sc RNA-seq)technology is currently the most advanced technique for revealing cellular heterogeneity at a microscopic level.This technology captures cell heterogeneity to achieve precise cell type classification,thereby promoting the development of genomics.The identification of single-cell types has made significant contributions to the development of technology for people’s health,the advancement of developmental biology,the improvement of major disease prevention and control systems,the realization of precision medicine,and the creation of national strategic technological power.However,the characteristics of sc RNA-seq data,such as dimensionality explosion,technical noise,and drop-out events,limit the further development of the single-cell field.Therefore,this dissertation aims to overcome the limitations of sc RNA-seq data,finely mine intercellular heterogeneity,accurately identify cell types,and improve downstream analysis performance.It elaborates and overcomes the limitations of single-cell data from four aspects: features,neighborhoods,clusters,and associated networks.The specific research contents are as follows:(1)To address the problems of dimensionality explosion and noise pollution at the feature level,this dissertation proposes a cell type identification method based on optimal feature non-negative matrix factorization(OFNMF).The method consists of three parts: non-negative matrix factorization algorithm(NMF),similarity learning,and spectral clustering.First,this method calculates the differences in cell information to determine whether cell features are sufficiently learned,and then adapts to obtain the optimal dimensionality reduction number to achieve precise NMF dimensionality reduction.Next,OFNMF constructs a cell similarity learning framework to explore the relationship between cells.Finally,the spectral clustering algorithm describes the clustering results of cells.Experiments have shown that OFNMF can overcome noise pollution and achieve accurate dimensionality reduction,thereby achieving effective clustering.(2)To address the problems of uneven neighborhood spatial density and rough similarity construction,this dissertation proposes a cell type identification method based on similarity-corrected non-negative low-rank representation(SCNLRR).Specifically,the SCNLRR method combines the low-rank representation method(LRR)and the graph regularization term based on the locality sensitive hashing algorithm(LSH).Through subspace mapping and manifold learning,it accurately constructs a similarity matrix while overcoming the differences in neighborhood spatial density.Finally,the cell similarity information constructed by SCNLRR is applied to gene marker selection experiments,and the effectiveness of this method is verified through experiments.(3)To address the problems of fuzzy cluster boundaries and feature redundancy in cluster structures,this dissertation proposes a cell type identification method based on adaptive total variation joint learning(JL-ATV).This method combines dimensionality reduction learning and segmentation reconstruction subspaces to obtain more effective cell feature descriptions by overcoming feature redundancy problems,which improves the accuracy and interpretability of cell type identification results.At the same time,the adaptive total variation method(ATV)can adaptively select a smoothing solution based on gradient information to learn the inter-cluster boundary characteristics while overcoming noise interference,thereby more accurately capturing the cell correlations within the population and the heterogeneity between populations.Experimental results show that the JL-ATV method can achieve more accurate cell type identification while maintaining interpretability.(4)To address the problems of data heterogeneity and cell-to-cell relationship complexity in association networks,this dissertation proposes a cell type identification method based on consensus-guided graph autoencoder(sc GAC).In the sc GAC method system,the original data is preprocessed into multiple top-level feature sets,and feedback feature learning is performed through graph autoencoder(GAE).In this learning process,the new feature representation and the similarity matrix learned based on distance fusion method guide each other,which can effectively preserve the diversity of data structures.Finally,the cell relationships learned from multiple top-level feature datasets are integrated into the final similarity matrix,which is used for downstream analysis.Experiments have shown that sc GAC can learn data structures more comprehensively than other methods and can adapt to the complexity of cell-to-cell relationships.The methods proposed in this dissertation have been applied to sc RNA-seq data,and experimental results show that these methods overcome the limitations of single-cell RNA sequencing data,accurately identify key features,effectively preserve the multiple structures of the data,and thus improve the accuracy of cell type identification.Ultimately,this provides strong data support for people to more deeply analyze gene expression states and gene structures at the single-cell resolution.In addition,these methods provide information technology support for cell development,disease prevention,and drug design,and provide support for difficult-to-cure diseases at the cellular level.
Keywords/Search Tags:Single-cell RNA sequencing data, Single-cell type identification, Non-negative matrix factorization, Low-rank representation, Graph autoencoder
PDF Full Text Request
Related items