Font Size: a A A

Research On Interpolation Algorithm Of Single-cell RNA Sequencing Data Based On U-Net Network

Posted on:2022-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:K G CuiFull Text:PDF
GTID:2510306767477414Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of bioinformatics technology,single-cell sequencing technology has been applied in more and more scenarios,resulting in a large amount of single-cell RNA sequencing data.However,due to the limitation of single-cell sequencing technology,single-cell RNA-sequencing data exhibits a large missingness,which is manifested as more than70% of missing values(zero values)in RNA-sequencing data.These missing values may real y be "true zeros" that are not expressed,while there are also a large number of counts that should have low to medium expression in cel s,but this "false zero" count phenomenon is caused by sequencing failures.The latter zero caused by sequencing failure is referred to as a "dropout " phenomenon.For the "dropout" phenomenon,there are several different data imputation algorithms based on different principles and models,but the general effect is not ideal,and even some deep learning-based imputation algorithms are less accurate than non-deep learning algorithms.In this paper,through the comprehensive research and analysis of image inpainting technology and single-cell imputation algorithm,a deep-learning-based single-cell imputation algorithm with high imputation accuracy and fast computing speed is proposed.Since single-cell downstream analysis(including cell typing,trajectory inference,differential expression,etc.)is based on highly variable genes,the algorithm first normalizes the gene expression matrix,and then performs gene filtering to retain highly variable genes.The size of the gene expression matrix can be reduced,thereby improving the execution efficiency of the subsequent algorithm;then each cell in the matrix containing all highly variable genes is converted into an image;at the same time,we use the Numpy class library to randomly generate a large number of masks in batches(mask)data set,which is used to cover part of the gene expression matrix data and create a control group to facilitate subsequent data training;then the cell images are sent to part of the convolutional neural network of the U-Net architecture for training,and the full volume in the U-Net architecture The convolutional neural network is improved into a partial convolutional neural network with mask update.Each layer of the network will update and fill the mask area,and the mask area will eventual y be fully assigned;after the training,the original cell image will be repaired.In this process Only the dropout regions in the original data are inpainted,and the regions without dropout events are no longer inpainted;final y,all images are reshaped into expression vectors and integrated together to complete the inpainting.In order to improve the efficiency of the algorithm,we use Py Torch to perform parallel programming of the GPU,which greatly optimizes the running time of the algorithm.Compared with the classical imputation algorithm and deep learning imputation algorithm based on probability and statistics,through experimental analysis,the algorithm in this paper has higher accuracy and speed for the imputation of the gene expression matrix of single-cell RNA sequencing,and can be well targeted for The dropout event is fixed and can effectively improve the results of downstream analysis.
Keywords/Search Tags:imputation algorithm, Single cell RNA sequencing, Gene expression data, U-net network, deep learning
PDF Full Text Request
Related items