The development of the next generation sequencing technology(NGS)has advanced the genomics research in many application domains.Metagenomics is one such powerful approach to study large community of microbial species.For the unknown species in the metagenomic samples,gene assembly and identification without a reference genome is a very challenging problem.In addition,with the rapid development of gene sequencing technology,genomic sequencing data is growing rapidly.To overcome these issues,distributed gene assembly software handling multiple metagenome samples can be used.In this thesis,based on highly scalable gene assembly software SWAP-Assembler,we have optimized various processes of metagenome assembly analysis,and proposed a new gene prediction and deduplication method based on Union-Find.All of them met with good results.Also,we present a work flow called WFswap to assemble large genomic data based on many samples and to identify more genes.Our computational analysis reveals that the proposed workflow WFswap showed better performance,we could assemble longer genes and find more benchmark genes.Finally,in this thesis,function improvement and optimization were conducted for swap-assembler,contig and scaffold were further extended and we successfully upgraded the N50 assembly standard. |