Font Size: a A A

Research On The Evaluation Method Of Genomic Structural Variation Identification Results

Posted on:2022-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y LeiFull Text:PDF
GTID:2480306749458094Subject:Philosophy of science and technology
Abstract/Summary:PDF Full Text Request
The study of structural variation(SV)is of great importance in the evolution of the human genome,the analysis of differences in population expression,and the causes of disease.In recent years,structural variation identification methods based on second-and third-generation sequencing data have been vigorously developed and are useful in identifying different SV types and lengths,respectively.However,the evaluation methods for SV identification results can still be further improved to help enhance the accuracy of SV identification methods.Therefore,SV_STAT is proposed for SV detection results evaluation which is sensitive,fast and intuitive.Firstly,in addition to basic metrics such as true positive(TP)calls,false positive(FP)calls,false negative(FN)calls,precision(Pr),recall(Rc)and F1 score,the deviation of variation regions from those in the benchmark set is quantified by center distance(d)and region size ratio(r).In this way,the results can be evaluated in a more comprehensive and detailed way,thus providing a great help to develop and improve the SV identification tools,especially in obtaining more numbers and specific locations of the SVs actually called.Subsequently,it is also particularly important for a method to be more efficiently.In the process of evaluating SV identification results,input files with large data volumes like tens to hundreds of thousands of records are very common,and if traversal is used,the running time will reach several hours or even a dozen hours,and its time complexity will be up to O(n~2).To solve this problem,it is proposed to reduce the time complexity to O(n)by searching each record as narrowly as possible and excluding multiple unrelated matching records.Due to the different SV identification methods and alignment tools,there may be a large deviation of the SV position in the identification results compared with the corresponding SV records in the benchmark set,which leads to these records being classified as false positives.To address this situation,SV_STAT adjusts the matching conditions when performing the evaluation to avoid misclassification for true SVs as much as possible.Finally,there are various factors that affect the different results of structural variation identification,including aligners,sequencing data,etc.In this paper,we evaluate the performance of multiple structural variation detection methods using SV_STAT through several sets of experiments with different single variables.The results show that SV_STAT provides a multifaceted analytical basis for improving SV identification methods and provides a reference for the selection of alignment tools,sequencing data and their coverage in SV identification.
Keywords/Search Tags:structural variation, identification, evaluation, benchmark set
PDF Full Text Request
Related items