| Nowadays with the rapid development of RNA sequencing techniques,single-cell RNA-sequencing(scRNA-seq)has arisen worldwide concerns.It is of great significance to classify the acquired scRNA-seq data of the patients and find out the location of disease.Due to the high-dimensional and discrete characteristics of scRNA-seq data,traditional classification methods are not efficient or even not work.Therefore,the task of single-cell classification is challenging,specifically in terms of computational efficiency and accuracy of classification tasks.This paper analyzes the characteristics of scRNA-seq data and builds a classification model Laz AE(learning the latent features for the classifications of single-cell RNA-seq samples with zero-inflated negative binomial distribution as the autoencoder loss).We demonstrate the superiority of the model by comparing with other methods.Comparing with the poisson distribution,there are a large number of zeros in scRNA-seq dataset.Therefore,a zero-inflated possion distribution is proposed to fit the scRNA-seq data.However,the over-dispersion of scRNAseq data is not considered.We need to find a more reasonable distribution to fit the scRNA-seq data.The zero-inflated negative binomial distribution takes the over-dispersed and zero-inflated pattern into account.Considering scRNA-seq dataset with an excess of zeros,we assume that the dataset comes from a mixed distribution: one is a negative binomial distribution,and the other is a point mass at zero.In recent years,deep learning has been widely used in various fields,which has achieved breakthrough results in vision and natural language processing.In this paper,we propose a new framework by combining deep learning with conventional statistical models.Our model consists of two main parts,the estimation of parameters and computation of discriminant scores.The first part is based on an autoencoder of zero-inflated negative binomial distribution,which completes the parameter estimation of the likelihood function.The second part is zero-inflated negative binomial logistic discriminant analysis,and the trained parameters are put into corresponding density function.Combining with prior information,we obtain discriminant scores of each sample in different categories.We also compare the parameter estimation of other distributions to demonstrate the advantages of Laz AE algorithm.This study presents a proof-of-principle Bayesian framework by integrating the statistical zero-inflated negative binomial distribution as the loss function of the autoencoder network.A comprehensive series of real and simulated datasets are conducted to compare the proposed Laz AE algorithm with the existing methods.The proposed Laz AE algorithm demonstrates strong capability in learning the latent patterns for the classification tasks on the scRNA-seq datasets. |