| As the research of life science enters the era of genomics,sequencing technology develops rapidly.Relying on high-throughput sequencing technology,single-cell RNA sequencing technology has emerged as the times require,and its emergence has brought unprecedented technological progress to the exploration of single-cell level and gene expression profiles.However,due to the low amount of starting material,single-cell RNA-sequencing technology has the limitations of low capture rate and high deletion rate,which seriously affects downstream data analysis.At present,some interpolation methods have been proposed to solve this problem.Although gratifying results have been obtained,there are still problems such as excessive interpolation introducing new noise,and the inability to fully learn the nonlinear dependence between cells.For the existing problems,This study proposes sc-Gimpute,an imputation method based on BEGAN networks.The main work content of this research includes the following four points:(1)Introduce a true-zero-false-zero identification mechanism.Since there are genes with a true expression value of zero in the original sequencing data,which are indistinguishable from the zero value of missing genes,the genes with an original expression value of zero are also restored during imputation,thus introducing new technical noise.In order to avoid performance degradation caused by over-imputation,a true-zero-false-zero identification mechanism is introduced before interpolation,the original data is modeled using the ZINB distribution,and the genes that are actually missing are marked by estimating the probability of deletion,which realizes subsequent detection of deletions.Accurate recovery of events.(2)Using graph convolution to improve BEGAN(Boundary Equilibrium GAN)and propose an imputation method,named sc-Gimpute,to learn cell feature information in non-European space and use the generated data to impute.By replacing the linear convolutional layer of the BEGAN network with a graph convolutional layer,the improved BEGAN network can perform feature extraction on data in non-Euclidean space,train the generative model to generate new data that conforms to the distribution of the original data,and use the KNN(K-Nearest Neighbor algorithm).The missing gene expression of individual cells is restored in the generated data,and the true expression of the missing value is inferred from the generated real cells,thus avoiding the existing methods from overfitting large populations of cell types but not learning enough for rare cells The problem of preserving the biological differences between cells while removing the influence of noise.(3)Experimentally evaluate the imputation performance of sc-Gimpute.In this study,the imputation performance of sc-Gimpute was verified on 4publicly available real datasets and 2 simulation datasets,and compared with 5other mainstream imputation methods.Experiments have shown that compared with the existing mainstream methods,sc-Gimpute has shown good performance in terms of interpolation accuracy.The data recovered by this method has a high similarity to the expression trend of real data,and it still has the ability to cope with the increase in the data missing rate.Better interpolation stability,and at the same time,this method has significantly improved the clustering effect of downstream data. |