Font Size: a A A

The Implementation Of Metagenome Sequencing Assembly Based On De Bruijn Graph Algorithm

Posted on:2018-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z K ZhouFull Text:PDF
GTID:2310330512486429Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of DNA sequencing technology has been a huge stimulus for new ways to create and test new hypotheses in biology as well as to explore the phenomenon and laws of life with a novel and vastly enhanced prespective.Metagenomics is a new science that investigates the microbial communicaties by analyzing the microbial DNA extracted from the environmenta samples without the need of pute culture in laboratories.this large-scale genomic technology has greatly facilitated the deep analysis of microbiology in various environments.Determining the genomic sequences of species is a prerequisite to understanding their biology.As DNA sequencing technology cannot read whole genomes in model microbial communities,genome sequence assembly has always been the basic problem of Bioinformatics.However,the research of the metagenomic assembly algorithm is still in its early stage.Most metegenomic assembly projects still rely on the single genomic assembly algorithms.Unfortunately,most single genome assembly algorithms have limits on metagenome data.Therefore,this thesis focused on the study of metagenomic assembly algorithms.The main work of this paper is to add a new module for the metagenomic data based on the Meta-ARCS,which is based on the single genome assembly splicing software ARCS.We propose a novol genome assembly algorithm using second-generation sequencing data and develop a new genome assembly tool.The insert length of pair-ends is much longer than the length of read from one side.These pair-ends can be used more efficiently to untangle the complicated de Bruijn graph,generating much longer contigs and scaffolds.In order to solve the problem that how to fill the gap of scaffolds,we remove the gap,to obtain higher accuracy of short contigs,using coverage contigs information binning operation.Every bin of contigs belong to a single species.Experiments on real data sets and simulated data sets show that Meta-ARCS has achieved better results compared with existing software.
Keywords/Search Tags:Metagenomic, second generation sequencing technology, de Bruijn graph, contigs, scaffolds, metagenomic assembly
PDF Full Text Request
Related items