Font Size: a A A

Design And Analysis Of A Viral Quasispecies Haplotype Reconstruction Optimization Algorithm

Posted on:2016-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2180330461488485Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Viral quasispecies refer to a group of viruses competing within a highly mutagenic environment, the nucleic acid sequence structure of these viruses are highly similar. Reconstructing the gene structure of different haplotypes in viral quasispecies is of great significance to investigate the relationship between quasispecies diversity and disease epidemiology, also the more effective therapeutic treatments. The development of High-Throughput Sequencing technology provides a new way to research viral quasispecies. Viral quasispecies haplotype reconstruction problem is that using High-Throughput Sequencing reads to reconstruct the haplotype of each stains from the vial quasispecies. Due to the number of reads produced by High-Throughput Sequencing is very large, and mixed with a lot of sequencing error in the sequencing process, which lead to a big challenge in reconstructing haplotypes of viral quasispeices.Aiming at this problem, this paper designed an optimized algorithm of viral quasisipecies haplotype reconstruction. First of all, the algorithm combines a variety of sequencing reads screening methods, to filter low quality read out. Then, fusing the algorithms based on Poisson distribution model and based on hamming distance clustering method for sequencing read further correction. What’s more, adopted a global reconstruction method based on multinomial distribution model to reconstruct viral quasisipecies haplotypes by using the corrected reads. Finally, estimate the frequency of reconstructed haplotypes by using a clustering algorithm. With a lot of experiments, compared with QuasQ and QuRe, our algorithm performed better on indicators of the quantity of haplotype reconstruction, accurate rate and F-measure.The existing method to generate simulated data whose base mutation method and mutation distribution model are simple. In order to solve this defect, this paper designed a generator to simulate viral quasispecies sequences. The generator is based on a strain mutations distribution model and a strain frequency distribution model, and using ART sequencing simulation tool to generate sequences, so that it can simulate genetic mutations and frequency distribution of each strains in viral quasispecies better. For the generator, we developed a visualization platform. The simulation data generated by the platform are straightforward, supporting the export and easy to save, which provides great convenience for subsequent research work.
Keywords/Search Tags:viral quasispeices, High-Throughput Sequencing, sequencing error correction, haplotype reconstruction
PDF Full Text Request
Related items