Optimization On Genomic Big Data Assembly

Posted on:2020-10-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Zheng

Full Text:PDF

GTID:2370330596464250

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The development of the next generation sequencing technology(NGS)has advanced the genomics research in many application domains.Metagenomics is one such powerful approach to study large community of microbial species.For the unknown species in the metagenomic samples,gene assembly and identification without a reference genome is a very challenging problem.In addition,with the rapid development of gene sequencing technology,genomic sequencing data is growing rapidly.To overcome these issues,distributed gene assembly software handling multiple metagenome samples can be used.In this thesis,based on highly scalable gene assembly software SWAP-Assembler,we have optimized various processes of metagenome assembly analysis,and proposed a new gene prediction and deduplication method based on Union-Find.All of them met with good results.Also,we present a work flow called WFswap to assemble large genomic data based on many samples and to identify more genes.Our computational analysis reveals that the proposed workflow WFswap showed better performance,we could assemble longer genes and find more benchmark genes.Finally,in this thesis,function improvement and optimization were conducted for swap-assembler,contig and scaffold were further extended and we successfully upgraded the N50 assembly standard.

Keywords/Search Tags:

big data, gene assembly, gene prediction, Union-Find, multiple samples

PDF Full Text Request

Related items

1	Research Of Protein Function Prediction Based On The Gene Ontology Structure
2	Research On Genome Assembly And Prediction Based On Deep Learning
3	Research On Construction Methods Of Gene Catalogue Of Metagenome And Their Application
4	Improving Gene Structure Prediction By Combining Multiple Sources Of Evidence
5	Analysis Of The ABC Transporter Family Gene Homologous Sequence Ctg16 Of Phanerochaete Chrysosporium
6	The Analysis Methods Of Gene Prediction And Long Noncoding RNA Identification With RNA-Seq
7	Evaluation Of MicroRNA Target Prediction Programs And Analysis Of The Features Of Target Gene
8	The Prediction And Analysis Of Fuctional Genes Of Multiple Microbes
9	Research On 2D Spatial Gene Selection Algorithm Based On Unbalanced Gene Data
10	Representation-learning-based Algorithms Of Predicting MicroRNA And Gene Relationships