| In the past few years,research related to single-cell RNA sequencing(sc RNA-seq)has developed rapidly for human health as well as for the diagnosis,monitoring,and treatment of diseases.Cell typing of single-cell RNA sequencing data is one of the typical problems to study single-cell data sets.With the rapid popularity and application of next-generation sequencing technologies,the scale of single-cell RNA sequencing data has grown dramatically,with singlecell sequencing throughput grown from thousands of cells to tens of thousands of cells in the mainstream.There are many unsupervised clustering algorithms for cell typing on large-scale single-cell data,which mainly reduce the time complexity by simplifying the cell-cell relationship network,but leading to reducing the clustering accuracy of cell typing and their robustness.And the existing high-precision cell typing methods cannot handle large-scale data well.In addition,most of the current differentially expressed gene analyses are to fit a single probability distribution.So,it leaves much room for improving current cell clustering and differentially expressed gene detection methods due to the high heterogeneity and high dropout noise of sc RNA-seq data.To this end,an unsupervised cell clustering algorithm named SCMC and a differentially expressed gene detection method named DEGman based on large-scale single-cell RNA sequencing data are proposed in this paper.SCMC first combines k nearest neighbors associated with cell similarity thresholds to construct a novel cell-cell relationship network,and then improves the Markov clustering algorithm for cell clustering.Comparing with some mainstream single-cell clustering algorithms on seven large single-cell datasets,the algorithm was found to have better clustering accuracy than these algorithms for identifying cell types,even for rare cell types,thus it can be seen that the algorithm is more suitable for current cell typing of single-cell sequencing data.The subsequent differentially expressed gene analysis algorithm DEGman employed Bhattacharyya distance,tried to fit multiple possible distributions,and finally detected differentially expressed genes by permutation test.In addition,SCMC uses multi-core CPUs and GPUs for heterogeneous parallel computing to enhance the speed of the algorithm.Compared with other differentially expressed gene analysis tools,DEGman achieves the best balance of sensitivity and accuracy,and successfully identifies differentially expressed genes associated with fear in mouse brains.Therefore,it can be estimated that these two algorithms can be well applied to single-cell sequencing data analysis. |