Font Size: a A A

Research On Genome Missembly Classification Method Based On Paired-end Reads Data

Posted on:2024-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:J L GaoFull Text:PDF
GTID:2530306917965569Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
DNA forms the genetic instructions that guide the development and functioning of organisms,it is important for life science research to obtain complete and accurate DNA sequences.The next generation sequencing technology(NGS)can sequence DNA molecules with low cost at a short time,but because of the complexity of genome and the short length of reads,there may be assembly errors in assembled sequences.Although many methods have been proposed to detect assembly errors,there are false discoveries which are correct assemblies in detection results.In this paper,a misassembly detection results evaluator(misassembly)was proposed to classify the correct assemblies and assembly errors in the misassembly detection results according to different features extracted from the alignment information of paired-end reads and assembly sequences by clustering analysis.The main research tasks are as follows:(1)Select and quantify the features of assembly errors.misClas classifies the regions reported by misassembly detection tools according to abnormal coverage,abundant indels,incorrectly paired orientation,unreasonable insert size and inter-contig read pairs to classify the false discoveries and real assembly errors.Each feature is quantified to represent intensity of abnormality.(2)Take clustering analysis of the assembly error detection results.In order to accurately distinguish the correct assembly sequences and assembly errors in the misassembly detection results,a comprehensive analysis of features is performed.(3)Evaluate the performance of misClas.Different misassembly detection tools were used to detection assembly errors on different paired-end data to comprehensively and accurately validate the performance of misClas.misClas classifies the misassembly detection results according to the alignment information of paired reads onto assembled sequences by clustering analysis,it can improve the precision of misassembly detection results after classification,and it is beneficial for the subsequent genomic studies,e.g.,analysis of operon/regulon structure,structural variation(SV) detection,and gene annotation.
Keywords/Search Tags:genome assembly, cluster analysis, misassembly, paired-end reads
PDF Full Text Request
Related items