| With the continuous maturation and development of single-cell sequencing technology,single-cell multi-omics research on genome,transcriptome,and epigenome,has strongly promoted the rapid development of molecular biology,genetics,clinical medicine,and other fields.Under the background of the single-cell sequencing data outbreaking,sequencing data analysis has become a hot research topic in the field of bioinformatics.Applications of the single-cell transcriptome data denoising and cell clustering methods have obtained great achievement.However,existing methods still show some limitations,especially when facing the massive sequencing data,and some methods cannot meet the current requirements of cuttingedge transcriptome research in terms of model robustness,clustering accuracy and calculation consumption.The main research and innovation of the dissertation are summarized as follows:the paper introduces the single-cell transcriptome sequencing technology and the associated analysis methods and summarizes the basic methods and research status quo of the single-cell transcriptome data denoising and clustering,analyses the present research difficulties and problems of the single-cell transcriptome data denoising and clustering.In the dissertation,the single-cell transcriptome denoising and clustering methods were studied,and several methods and tools were designed and developed to effectively solve the bottleneck problems in the current single-cell transcriptome data analysis.The main research contents of this dissertation are as follows:(1)To address the problem that existing scRNA-seq data denoising algorithms focus too much on local neighborhood information between cells,which makes the model easy to overfit,a single-cell transcriptome denoising method based on the evolutionary sparse model(scESI)was proposed in this paper.This method is based on sparse representation model to measure the adjacency relationship between cells.An evolutionary iterative search algorithm is introduced to mine the Pareto optimal cellular adjacency matrix,which not only retains the topological relationship between cells but also ensures the diversity of gene expressing patterns.Finally,the obtained Pareto optimal cellular adjacency matrix is applied to reduce data noise and improve data quality.Experimental results on international authoritative datasets show that scESI not only performs well in the data denoising but also has positive effects on downstream analysis tasks such as clustering analysis,cell visualization,cell differentiation trajectory reconstruction and differentially expressed gene selection.The case study results show that scESI can not only recover the undetected expression characteristics but also find some functionally specific cell subsets.(2)To solve the problem that existing scRNA-seq data clustering algorithms are difficult to comprehensively measure the similarity between cells,a multi-scale clustering method based on tensor graph diffusion is proposed(MTGDC).The multiscale similarity measurement method was used to mine local topological relationships of cells,and the global high-order topological information between cells was learned with a tensor graph diffusion framework.An efficient tensor graph diffusion updating algorithm was introduced to significantly improve the differential learning ability between cells without increasing the computational burden and thus improve the clustering accuracy of the method.Experimental results on international authoritative datasets show that MTGDC has better performance in clustering and visualization than other algorithms.The case study demonstrated that MTGDC can accurately detect rare mouse cortical cell subtypes and detect new functionally specific neuron cell subtypes in mouse neurons.(3)In order to solve the problem of insufficient robustness of existing scRNAseq data clustering methods,a multi-objective particle swarm optimized clustering ensemble method is proposed(CEMP).Multi-subspace embedding method and basic clustering module are designed to ensure the diversity and robustness of basic clustering results,and the multi-objective particle swarm optimization integration framework is designed to further improve the robustness of clustering results.Convergence analysis proves that the model can obtain global optimal clustering results.Experimental results on international authoritative datasets show that CEMP is superior to compared clustering methods in terms of clustering accuracy and robustness.The case study on mouse nerve cells also proves that CEMP can achieve good clustering performance when dealing with data sets with complex clustering structures,that is,successfully identifying the main cell types and their corresponding cell subtypes.(4)In order to improve the model analysis speed and clustering performance,a semi-supervised fast clustering method based on integrating multiple sources transcriptome(LFSC)was proposed.This method was the first to introduce an anchor graph integrating reference sample information to implicitly measure the level of difference between cells.While making the model complexity reach the linear level,the reference sample information is used to improve the clustering accuracy and visualization effect of the model.The results on international authoritative datasets prove that LFSC is superior to the benchmark methods in cluster analysis,data visualization and model robustness.In addition,case studies on hepatocellular carcinoma infiltrating T cells have demonstrated that LFSC has shown good performance in discovering new cell types,discovering differentially expressed genes,and exploring new cancer-related biomarkers.This subject focuses on the denoising and clustering of single-cell transcriptome data and further improves the accuracy,robustness,scalability,and computational performance of single-cell transcriptome analysis tools,aiming to fully mine the potential genetic information in single-cell sequencing data.I designed and developed the single-cell transcriptome data denoising and clustering methods or analysis tools,to solve the bottleneck problem of the present single-cell transcriptome.These methods have promoted the development of related research based on the scRNA-seq technology which provides a new research idea and technology for cutting-edge research of the single-cell transcriptome,showing a high application value. |