Font Size: a A A

Optimization On Genomic Big Data Assembly

Posted on:2020-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ZhengFull Text:PDF
GTID:2370330596464250Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of the next generation sequencing technology(NGS)has advanced the genomics research in many application domains.Metagenomics is one such powerful approach to study large community of microbial species.For the unknown species in the metagenomic samples,gene assembly and identification without a reference genome is a very challenging problem.In addition,with the rapid development of gene sequencing technology,genomic sequencing data is growing rapidly.To overcome these issues,distributed gene assembly software handling multiple metagenome samples can be used.In this thesis,based on highly scalable gene assembly software SWAP-Assembler,we have optimized various processes of metagenome assembly analysis,and proposed a new gene prediction and deduplication method based on Union-Find.All of them met with good results.Also,we present a work flow called WFswap to assemble large genomic data based on many samples and to identify more genes.Our computational analysis reveals that the proposed workflow WFswap showed better performance,we could assemble longer genes and find more benchmark genes.Finally,in this thesis,function improvement and optimization were conducted for swap-assembler,contig and scaffold were further extended and we successfully upgraded the N50 assembly standard.
Keywords/Search Tags:big data, gene assembly, gene prediction, Union-Find, multiple samples
PDF Full Text Request
Related items