Font Size: a A A

Bioinformatics Analysis Of The Chimeric Sequences Generated In Multiple Displacement Amplification And Its Potential Use In Haplotype Assembling

Posted on:2017-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2310330491962526Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Whole genome amplification (WGA) based on multiple displacement amplification (MDA) has been widely used in many recent genomic researches. Due to many obvious advantages, the phi29 DNA polymerase is used in the process of amplification. It uses microscale quantity of initial DNA to generate DNA fragments as long as more than 10kbp, and has high correctness, good proofreading activity and low amplifying bias, which overcomes the shortcuts of traditional amplification methods. However, there have been few reports focusing on the disadvantages of the phi29 DNA polymerase, which somehow covers its real nature. In this research, we downloaded more than 200 GB sequence data from Next-Generation Sequencing (NGS) platform which used for whole genome haplotyping. We concentrated on the feature of chimeric sequences (i.e. chimeras) generation during the process of phi29MDA, and tried to finish the systematic statistics of chimeric sequences about their proportion, classification and distribution. Moreover, we attempted to illustrate the generating process of chimeric sequences in phi29MDA on the aspects of thermodynamics or kinetics. In addition, we proved the potential utilization of chimeric sequences in whole-genome haplotyping of Homo Sapiens due to their special structural characteristics through experimental and bioinformatics analysis.Our results included:(1) A bioinformatics pipeline based on Illumina HiSeq sequencing platform was constructed to realize the statistics about the total amount of chimeras, the proportion of chimeras in whole sequence data and the detailed amount of each kind of chimeras. In this series of phi29MDA sequence data, the total proportion of chimeras was statistically significant (almost 6%).(2) Chimeras would be classified as wasted data in normal alignment pipeline. According to this feature, a small-scale bioinformatics pipeline was constructed to realize the recycling of chimeras, which has been simplified and improved, thus improved the utilization efficiency of sequence data.(3) Focusing on two key statistical indexes about the chimeric structure, graphic software was used in the data analysis pipeline to visually illustrate the distribution profile of the chimeras. This would help the generation of chimeras on the aspects of thermodynamics or kinetics, i.e. creatively interpreted the characteristics of phi29 DNA polymerase through NGS data analysis.(4) Through the phi29MDA sequence data analysis of two different kinds of E. coli, it was proved that both of the segments in a chimera were molecularly homologous. Meanwhile, their physical distance could be as long as 5kbp according to the previous statistics in this research. This phenomena demonstrated the potential of chimeras in whole genome haplotyping. On the basis of the conventional approach about haplotype assembling, chimeras were probably available to further improve some indexes for the haplotyping assessment, e.g. N50, and average length of scaffolds.
Keywords/Search Tags:Whole genome amplification, Multiple displacement amplification, phi29 DNA polymerase, Next generation sequencing data, Chimeric sequences, Haplotype assembling
PDF Full Text Request
Related items