Font Size: a A A

Genomic Structural Variant Prediction Algorithm And Software

Posted on:2022-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GuFull Text:PDF
GTID:2480306608970969Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Genomic structural variants have great impacts on biological phenotype and diversity.Comprehensive and accurate identification of genomic structural variants among species,populations,organisms and even tissues is one of the important steps to understand the genetic components of phenotypic variation.In addition,genomic structural variants have been linked to the occurrence and development of numerous complex diseases.Therefore,genomic structural variants detection is an important task in bioinformatics research.Recently,the Arising of the third generation(long read)sequencing technologies makes it possible to identify longer or more complex genomic structural variants.Thus,long read based genomic structural variant detection has been drawing attention of many recent research projects,and many tools have been developed for long reads to detect structural variants recently.In this thesis,we present a new method,called SVLR,to detect structural variants based on long read sequencing data.The main process of SVLR includes 4 parts:identification,clustering,classification and optimization.The first 3 steps are similar to that in SVIM,one of the best methods to detect genomic structural variants now.On this basis,we improve each of the step to enhance the function of SVLR,and add a step of optimization to improve the performance of SVLR at the end of our method.The details are shown as follow:1.We first present the concept of "Source Region" and "Destination Region" in the first step of SVLR,which is helpful to accurately describe the location information of structural variants,especially for cut?paste insertion,interspersed duplication,and translocation,etc.And,we first consider the "Read Confidence Level" when we distinguish the real structural variants from the one causing by sequencing error or noise,etc.2.We first consider the case of Interactive Structural Variants in the third step of SVLR,where the positions of source regions and destination regions of two structural variants may staggered.In this way,SVLR can significantly improve the accuracy of cut?paste insertion and interspersed duplication.And,comparing to existing methods,SVLR can detect three new kinds of structural variants:block replacements,block interchanges and translocations.3.We first add a process of optimization in the last step of SVLR.In this step,we filter some situations which may cause some conflicting or wrong structural variants.In this way,we can greatly ensure the accuracy of SVLR.By testing the function of SVLR with the simulation data,we prove that SVLR is a powerful and high precision method.Specifically,for the classic structural variants that can be detected by state-of-the-art methods(e.g.,SVIM and Sniffles),our experiments demonstrate recall improvements without harming the precisions.For the new structural variants,SVLR achieves accuracies that are comparable to those of the classic structural variants.Furthermore,we also test SVLR with different alignment tools(LAST and NGMLR),and coverage,etc.
Keywords/Search Tags:Genome Variation, Genomic Structural Variant Detection, Third Generation Sequencing, Long-read Sequencing, Single-molecule Sequencing
PDF Full Text Request
Related items