Font Size: a A A

A Pipeline Analysis Method For Single Cell Sequencing Data

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:B Y ZhangFull Text:PDF
GTID:2417330590473533Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The latest generation of sequencing technology is single cell sequencing,and scRNA-seq(single cell RNA sequencing)is one of the representative technologies.This technology can solve the problem of cell population heterogeneity,and help to discover as well as define new cell subtypes.The commonly used methods first perform quality checks on raw data,fill in missing data and normalization.After that,clustering analysis is implemented to separate the cells,and then the differentially expressed genes are screened out from clusters.Finally,further biological analyses are performed,including cell type enrichment analysis and transcriptional dynamics analysis.The accuracy of clustering analysis is a critical challenging task throughout the entire pipeline.This paper proposes a pipeline method: LAK(Lasso And K-means Based Single Cell RNA Sequencing Data Analysis Pipeline),integrating pre-processing,normalization,feature extraction,clustering,differential expression analysis and cell type identification into a complete tool.This paper focuses on modifying the clustering process,improving and perfecting existing methods to promote the accuracy of clustering.For other steps in pipeline,this paper analyzes,compares,and screens existing mature methods,adopting accurate methods with high stability and low computational cost to be involved in the analysis process.This paper integrates the Lasso lasso penalty as a feature selection method into the clustering algorithm,narrows the range of candidate genes,and extracts genes that have practical effects on clustering,so no additional gene selection or dimensional reduction method is needed.LAK can be directly applied to single cell sequencing data.In addition,this paper proposes a binary search algorithm for the parameter selection problem in clustering algorithm,and performs self-adaptive parameter optimization according to the size of data.Compared with other clustering methods,the clustering method in LAK has better stability and accuracy on the actual scRNA-seq dataset.In addition,this paper applies a complete analysis process on a public dataset,gives the specific cell type of each cell,and draws conclusions consistent with the relevant biological literature,further verifying the accuracy of the entire analysis process.
Keywords/Search Tags:scRNA-seq, clustering analysis, Lasso, K-means, cell type identification
PDF Full Text Request
Related items