| Recently,the developments in single-cell RNA sequencing(sc RNA-seq)technology have enabled researchers to measure gene expression levels at single-cell resolution.Clustering analysis of sc RNA-seq data can further characterize cell types and states,providing new techniques for intercellular heterogeneity in complex tissues.However,for noisy,high and sparse sc RNA-seq data,existing methods still encounter difficulties in obtaining the intrinsic characteristics and structure of cells,and rarely use prior knowledge to guide clustering,resulting in a mismatch between clustering and reality.Most traditional deep clustering algorithms are unsupervised learning,which usually ignore the prior knowledge and distance information between similar cells when clustering sc RNA-seq data,so that the obtained clustering results will not match the real biological knowledge,so semi-supervised clustering algorithms are gradually proposed.The current semisupervised clustering utilizes constraint information that is often a single label constraint or pairwise constraint,resulting in the performance of the clustering algorithm mainly depends on the quality of the constraint information.If the constraint information is noisy,the overall clustering quality will be affected.In order to solve these problems,we make a deep enhanced constraint clustering algorithm and structured clustering algorithm based on contrastive learning.In this thesis,we propose an enhanced constraint clustering algorithm sc DECL based on contrastive learning.sc DECL algorithm first obtains pre-trained autoencoders with better parameters by interpolating contrastive learning in order to learn the effective feature representation of the data.After that sc DECL constructs reconstruction loss and clustering loss by pre-training the model to obtain the latent features of all data.Further the algorithm is based on two different transformation rules to convert the prior label and pairwise distance information of cells into pairwise constraints,respectively,and further merge them into enhanced pairwise constraints to optimize clustering.Experimental results show that this strategy can achieve high quality constraint information and further improve clustering accuracy.In this thesis,a comparative analysis of the sc DECL algorithm and six other advanced algorithms is conducted on six datasets,and the algorithm has better results in three evaluation metrics.In addition,the visual analysis of the final clustering results further verifies the ability of the algorithm in learning low-dimensional representations of high-dimensional data.The above algorithms are mainly based on data feature-based clustering,further this thesis proposes a GNN-based enhanced structured algorithm sc DESC for sc RNA-seq data to exploit the structural information between cells for clustering.sc DESC mainly consists of a contrastive learning-based autoencoder module,an enhanced graph neural network module,and a mutual supervision module.Firstly,the autoencoder is pre-trained using contrastive learning,and the zero-inflated binomial distribution is used to optimize the training process of the autoencoder.Afterwards,the encoding layers of the pre-trained autoencoder are connected to the graph neural network module based on residual connections,and the layers are propagated and fused to obtain the information representation,and the weighted fusion is performed using the graph attention mechanism.Finally,a mutual supervision strategy is used to achieve simultaneous update of the network.In this thesis,we compare the performance of the sc DESC algorithm to seven other state-of-the-art algorithms on six datasets,three evaluation metrics perform better on the algorithm,and the final clustering results are viewed and analyzed.Ablation experiments are also performed to demonstrate the effectiveness of each part of the algorithm. |