Study Of Mining Algorithms For Single Cell RNA-Sequencing Data

Posted on:2021-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:H R He

Full Text:PDF

GTID:2370330602475019

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The single cell RNA-sequencing(scRNA-seq)technology based on high-throughput sequencing developed in recent years can carry out gene expression sequencing at the granularity of the single cell,so as to obtain the expression information of thousands of genes in a single cell,which provides support for identifying the gene expression characteristics of different types of cells and fully revealing the heterogeneity between cells.However,due to the limitation of sequencing technology and the high complexity of gene expression,the single cell RNA-sequencing data has the characteristics of large noise,high dimension and strong sparsity,which leads to the low clustering accuracy of traditional clustering technology for different cell populations.In order to improve the clustering accuracy of different cell populations,how to improve the accuracy of cell population recognition based on the scRNA-seq data was studied in this paper.By analyzing the problems of data preprocessing,dimensionality reduction and clustering in the traditional data processing method of scRNA-seq data,a method of dimension-reduction using Auto Encoder was proposed.Based on the characteristics of Stacked Denoising Auto Encoder(SDAE),which can reduce data loss to the greatest extent and has good processing ability for noise data,two dimensionality reduction clustering methods,SDAE-DBSCAN and SDAE-K-means,were proposed.The experimental results show that the dimension-reduction clustering method proposed in this paper reduces the dependence of the original algorithm on parameters and improves the clustering accuracy of cell population.The main research contents are as follows:(1)In the stage of data preprocessing,the loss rate of effective data was reduced by reducing the proportion of data screening,and L2 regularization was proposed to preprocess the data.This not only reduces the problem of large differences in the expression of different genes,but also minimizes the "strong" features and allows smaller but more characteristic features to emerge.(2)Aiming at the problem that the contribution rate of traditional PCA dimensionality reduction method is not concentrated in the processing of scRNA-seq data,it was proposed to use SDAE to reduce dimensionality and noise of the scRNA-seq data.The noise was added to the original data by means of random zero,and the generalization ability of the model was improved by learning the characteristics of the damaged data.This method can be used to automatically identify the noise points in the data through feature learning on the scRNA-seq data,and features with stronger robustness can be learned,so as to provide better data features for cell clustering and thus improve the ability to identify the cell population.(3)To solve the problem that the traditional clustering algorithm needs to set the clustering quantity and low clustering accuracy,DBSCAN algorithm was proposed to cluster the scRNA-seq data.Since the shape and structure of gene expression data in multi-dimensional space are not easy to analyze,the K-means algorithm is not guaranteed to be applicable.Moreover,the gene expression reflects the cell function,and the functional expression of the same kind of cells should be continuous in the similar spatial structure.Therefore,DBSCAN algorithm was used for cluster analysis.However,the values of Eps and MinPts have a great influence on DBSCAN clustering.In order to improve the clustering accuracy of DBSCAN,an improved adaptive clustering algorithm for calculating parameter values was proposed.For the traditional K-means clustering algorithm,it was found that using SDAE to reduce the dimension of scRNA-seq data could improve the clustering accuracy of K-means algorithm to a certain extent.In this paper,the deng data set was used for experiment.The experimental results show that the clustering accuracy of the two deep combination models proposed in this paper,SDAE-DNSCAN and SDAE-K-means,can reach 0.97 and 0.93 respectively,which are 0.2 and 0.16 higher than the traditional SC3 model.

Keywords/Search Tags:

single cell RNA-sequencing, single cell clustering, stacked denoising auto encoder, DBSCAN, K-means

PDF Full Text Request

Related items

1	Single Cell RNA-seq Clustering Method Based On Self-renewal Of Cell Relationship Matrix
2	Data Analysis Based On Dimension Reduction And Clustering Of Single Cell RNA Sequencing
3	Establishment Of Single Cell Analysis Platform Based On Microfluidics And Its Application
4	Research On Single-cell RNA Sequencing Data Analysis Method Based On LDA Model
5	Single-cell Clustering Method Based On Consensus Strategy Evaluation
6	A Method Of Normalization And Transformation For Single Cell RNA Sequencing Data
7	Single Cell RNA Sequencing Reveals The Difference Of Transcriptomes In SCNT Embryos Drived From Different Somatic Cell Type
8	Cardiac Cell Classification Based On Single-cell RNA-sequencing Data
9	Single Nucleic Acid Molecule Manipulation And Single Cell Sequencing
10	Research On Single Cell Clustering Based On Graph Similarity Learning