Font Size: a A A

Research On Calling Methods Of Structural Variation Based On Third Generation Sequencing Data

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:R F BaiFull Text:PDF
GTID:2370330605975997Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The third generation sequencing technology develops so rapidly to enable people to infer the complete chromosome sequence of an individual through a small number of long reads.It also promotes the development of structural variation calling.The accurate calling of structural variations is crucial to the study of human genetic diversity and clinical diseases.The gene sequences used for structural variation calling express isolated text information,and the variation features need to be extracted manually,which is greatly affected by human factors,so the calling results are one-sided and limited.The precision and sensitivity are not satisfactory.Therefore,based on third generation sequencing data,this paper proposes a deep learning method to call structural variations with read alignment images as inputs,in order to improve the precision of structural variation calling,the main work is summarized as follows:(1)Research on mapping gene sequences to sequence alignment images.In the process of mapping gene sequences to read alignment images,the key problems such as selections of mapping regions,design of image coordinate,extractions and calculations of variation signals and expressions of color patterns are studied.The text information expressed by gene sequences is isolated,obscure and has some errors.It is presented in the form of read alignment images in the whole process of structural variation calling.It can not only vividly display the original comparison informations,but also extract and calculate the signals of variation regions which are different from other regions as much as possible.Besides,it provides efficient and reliable image inputs for the follow-up image classification models of deep learning.(2)Input sequence alignment images into the image classification models of convolutional neural network,this paper studies the calling of structural variations.In this project,self-made models are used to train,the text sequences of each candidate variation region are input into image classification models of convolutional neural network in the form of read alignment images,and the trained models are used to distinguish structural variations.Meanwhile,cross entropy loss function are used to optimize the performances of the models,and CUDA is used to accelerate the training process of the model,solving the problem of the time bottleneck in the whole calling process.(3)The proposed method CnnSV3 in this paper is tested and compared with the existing structure variation calling methods of third generation sequencing data,such as sniffles,SVIM and pbsv,and verify the advanced nature of the method in this paper.Experiments are carried out on the simulated sequencing data with different data depths and different deletion variation lengths,as well as the real sequencing data with different data depths and individuals.At the same time,the indirect experimental methods are used to test and evaluate the calling performance through Mendelian genetic law and data-sampling,which solves the problem of poor reliability of third generation sequencing data benchmarks on real data.The experimental results show that this method can detect a wider range of deletion variation lengths on both simulated and real data.The maximum structure variation that can be detected accurately is more than 200 million bases;the precision and sensitivity are both high,especially in low depth data,the performance advantage is more obvious.In addition,for the same individual,the calling results of second and third generation sequencing data are compared.The results show that more than 11500 deletions are detected based on third generation sequencing data,which can not be detected by second generation sequencing data.
Keywords/Search Tags:third generation sequencing technology, structural variation calling, read alignment image, deep learning
PDF Full Text Request
Related items