Font Size: a A A

Cluster Analysis For High-Dimension Sparse Sc-RNA Sequencing Data

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:K X JinFull Text:PDF
GTID:2370330572454103Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the field of molecular biology,unsupervised clustering algorithm has been of great signifi-cance for distinguishing the cell subpopulations which represent specific biological meanings.At present,the invention of single-cell RNA sequencing technology greatly promotes the develop-ment of molecular biology.At the same time,it brings a new challenge for conventional Unsuper-vised Clustering Algorithms.The difficulties mainly result from the high sparsity and a great deal of noise of the sequencing data.BOSSA normalization was proposed for solving this problem,which is based on latent variable normalization to eliminate the technological noise while preserv-ing the biological difference.But,like most Unsupervised Clustering Algorithms,it's efficiency and accuracy relies on the appropriate parameters,to some extent their universality is limited.With the purpose of solving those problems,a new method is proposed,Two-step Unsuper-vised Clustering(TSUC),which is aimed at analyzing single--cell RNA sequencing data.Besides reducing the parameter dependence of original algorithms,TSUC is able to do clustering cells with different biological heterogeneity.First,by means of a set of given Gaussian kernel func-tions,TSUC implements t-SNE to reduce dimensionality and automatically searches the cluster centers based on Density Clustering.In this way,relative big cell subpopulations can be obtained,which are stably distinguished.Different from step one which recognizes cell subpopulation in high heterogeneity,step two is aimed at re-clustering some cell subpopulations obtained from step one.This step is capable of identifying relative less heterogeneous cell populations.TSUC is well matched with universal cell relationship in molecular biology,and the result of real data analysis testifies the efficiency of TSUC.
Keywords/Search Tags:single-cell RNA sequencing, high sparsity, Unsupervised Clustering Algorithms, Gaussian kernel functions, t-SNE, Density Clustering
PDF Full Text Request
Related items