Font Size: a A A

Detection Of Genome Structural Variantions Based On Third Generation Sequencing Data

Posted on:2020-11-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:T JiangFull Text:PDF
GTID:1360330614450828Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous maturity and wide application of sequencing technology,the researches on genome,transcriptome and other omics surrounding sequencing technology have made great strides.It has promoted the multi-disciplinary reform of genome science,genetics and clinical medicine.As one of the most important and key links in genome,structural variation detection is of great significance for genome annotation,correlation analysis with diseases and phenotypes,and clinical diagnosis.However,due to a large number of complex structural variations in the genome,the existing detection technologies and methods have been unable to meet the needs of current genomic research in terms of accuracy,sensitivity,comprehensiveness and performance of variant identification.This poses a huge challenge to the growing volume of sequencing data.This thesis summarizes the basic approaches of genome structural variation detection.The difficulties and problems of structural variation detection at the present stage are emphatically analyzed.In this thesis,a series of relevant researches and practices are carried out to improve the accuracy and computational performance of structural variation detection.A number of methods and tools for genomic structural variation detection have been developed,which can effectively solve the bottleneck problem in current genome research.The main research contents of this thesis are as follows:(1)In view of the difficulty of accurate and sensitive identification of large scale and high similarity mobile elements,a genomic mobile element variation identification method r METL based on read realignment is studies in this thesis.This method adopts an innovative sequence re-alignment method.The abnormal alignment portion of the sequence fragment was realigned with known mobile element concensuses.The complex and varied local alignment information is transformed into highly consistent mobile element variation evidence information.Experimental results on internationally authoritative sequencing data sets show that r METL can effectively improve the sensitivity of mobile element variant detection and maintain high accuracy.This method is an important cutting-edge scientific tool for the accurate discovery of mobile element variation and the exploration of more associations with diseases and phenotypes.(2)To solve the problem that existing structural variation detection tools are unable to detect DNA sequences outside the reference genome,this thesis studies a novel sequence insertion identification method,r CANID,based on local assembly and clustering.This method takes the novel sequence insertion form as the starting point and combines with the local assembly method.By the double clustering and assembly of abnormal alignment fragments and unaligned reads,two type sequences near the insertion breakpoint and far from the insertion breakpoint were reconstructed respectively.Two kinds of local sequences are connected and merged by heuristic algorithm to identify the complete novel sequence insertion.Experimental results on internationally authoritative sequencing data sets show that r CNAID algorithm can effectively improve the detection sensitivity of new sequence insertion variation compared with existing structural variation detection algorithm,which is conducive to the discovery of DNA sequences unique to samples,and has important biological significance for the discovery and treatment of some rare diseases.(3)In view of the current situation that the recognition rate and sensitivity of structural variation detection technology are still at a low level,this thesis studies a genomic structural variation recognition method cute SV based on multi-feature fusion.This method uses the innovative multi-feature fusion clustering method to cluster the multi-mutation signals in abnormal sequencing reads.Structural variation was further integrated by using a variety of genomic spatial structural information.At the same time,the ability to find complex variations is also taken into account.The experimental results of international authoritative sequencing data set show that cute SV is a tool with the best comprehensive performance and the best computational performance in the field of structural variation detection.This tool will bring new support to relevant genomic engineering analysis.(4)Aiming at the computational bottleneck of structural variation detection at the present stage,this thesis studies an accelerated method of genome structural variation detection based on sequencing reads filtering,r MFilter.This method creates the index of regional hash-table and the fast statistics method of regional seed hit.Through accurate and rapid classification of sequencing reads,the input data volume is greatly reduced at the source of data analysis,and the computational cost of structural variation detection pipeline is fundamentally reduced.Experimental results on international authoritative sequencing data sets show that the combined use of r MFilter and mainstream structural variation detection pipeline can more than double the overall speed of structural variation detection based on the third-generation sequencing data,and achieve the same structural variation detection results as the original workflow.This tool can effectively improve the speed of structural variation detection and analysis.A light at the end of the day for large-scale genome analysis missions.This thesis focuses on the genome structural variation detection,and gives full play to the advantages of third-generation sequencing data to comprehensively improve the accuracy,sensitivity,diversity and speed of genomic structural variation detection.By developing a variety of type structural variantion detection methods and tools,to solve the bottleneck problem in the study of genome at present stage.It is of great practical significance to comprehensively and effectively promote the development of related research oriented by genome structural variation,which provides new research ideas,technical means and theoretical support for genome frontier scientific research.
Keywords/Search Tags:third generation sequencing technologies, structural variants detection, local assembly, sequence realignment, variant calling acceleration
PDF Full Text Request
Related items