Font Size: a A A

Research On Human Genome Indel And Structural Variants Detection And Analysis Approaches

Posted on:2014-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:1260330392472600Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of high-throughput sequencing (HTS) has promoted theresearch on human genomes. Dectection and analysis of indels and structuralvariants (SVs) are important topics. The existence of indels and SVs will lead tomapping difficulty and thus make the adjent genome sequence hard to analyze.Dectecting and analyzing indels and SVs in HTS data have brought challenges forbioinformatics research. This dissertation focused on the difficulties and problems inhuman genome indel and SV research, discussed on key issues and studied forsolutions. The main content includes:(1) Built a indel number estimation model for human genome based on datafrom different sequencing technologies. A comparison experiment of indel detectionby36bp read and100bp read reveals the indel number given by1000genomeproject which were derived from36bp read is on the low side. Thus estimated thelower boundary of total indel number in NA18507.(2) Studied methods of simulation data construction and model ofexperiment result evaluation. The methods and model are programmed and becomea set of simulation experiment tools for indel and SV detection. This research ofstudy includes: a) method of constructing donor genome b) method of generatingexpected alignment locations and forms for each simulated read c) model ofevaluating SV detection result, split read alignment result and discordant pairalignment result d) method of extracting the substantial alignments and expectedalignments of errorly aligned reads. A simulation evaluation experiment of severalrecent read mapping and SV detection methods shows these simulation andevaluation methods are able to reflect the mapping abilities of different typies ofmethods from discordant pair alignment and split read mapping aspect, which havebeen ignored in existed research.(3) Developd a new split read mapping algorithm based on a fixed gappenalty for the longest gap in the alignment.This algorithm controls the alignmentscores of split read alignments by giving fixed penalty to the longest gap andcontrols the total gap numbers by giving gap length related penalty to other smallergaps. The experiment result shows this algorithm has better split read mappingability compared with existed algorithms.(4) Developed a new SV detection methods combining discordant pairanalysis and split read mapping. This method utilizes the discordant pair analysis toconduct split read mapping in a reduced searching space and calls indels and SVsfrom the split mappings. This methods has been implemented and tested adequately with simulation experiments, real data experiments and biological experimentvalidation. The results shows this menthod has better running efficiency andcomprehensive SV detection abilities comparing with existed methods. Finally,according to the results the affection of coverage and repeat on variants detection isanalysed.
Keywords/Search Tags:Structural variant, Split read mapping, High-throughput sequencing, Simulated genome, Indel number, Repeat
PDF Full Text Request
Related items