Font Size: a A A

Genome Sequence Repetitiveness Quantification And De Novo Repeat Detection

Posted on:2022-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:C FengFull Text:PDF
GTID:1480306545467854Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
DNA repeats are widely distributed in various species and have been proven to play an important role in genome regulation and evolution.Fast and accurate identification of repeats in the genome has always been a challenging task in the field of bioinformatics.To make up for the shortcomings of library-based methods,some k-mer counting-based tools calculate repetitive scores based on kmer frequency to identify various repeats.Despite these tools are efficient in time and memory usage and perform well,they can still be improved in terms of repetitive score calculation,repeat boundary discrimination and sensitivity to SDs.Besides,although these tools have proposed methods to quantify the degree of repetitiveness,their application is limited to repeat identification,and the quantitative features contained therein have not been studied in depth.Therefore,we propose a new method based on weighted k-mer coverage,and apply it for de novo repeat detection,quantitative and comparative genomics analysis.The main results of this study are as follows:(1)we propose a more intuitive and accurate approach to quantify sequence repetitiveness and construct the genome repetitiveness map in human;(2)we find that there are certain correlations between sequence repetitiveness and various genome features;(3)we develop an efficient tool,Rep Loc,for de novo repeat detection by adopting a new locating and merging method,which improves the detection sensitivity and specificity;(4)we propose a new method,RDis,for calculating the evolutionary distance based on the cross-species repetitiveness map.The case study shows that RDis can avoid the influence of genome rearrangement to a certain extent;(5)we provide an online platform for visualizing,analyzing and comparing the genome repetitiveness map(http://bis.zju.edu.cn/reploc/).In this study,we quantify the sequence repetitiveness in any region of the genome from a new perspective,and develop an efficient repeat identification tool based on the repetitiveness map.In addition,we also explore the quantitative relationships between sequence repetitiveness and various genome features,and conduct cross-species genome repetitiveness analysis,which may provide new insights for more in-depth genome research.
Keywords/Search Tags:weighted k-mer coverage, repetitiveness, repeat detection, comparative genomics
PDF Full Text Request
Related items