Font Size: a A A

Research On The Analysis Method Of Beta Diversity Of Metagenomic Samples

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2510306566490974Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Most of the microorganisms in nature exist in the form of microbiome,the structure and function of the microbial communities are closely related to the state of their symbiotic environment.Whole-genome sequencing(WGS)enables detailed genomic information and high-resolution taxonomical information from the microbiome.Beta-diversity analysis on a large number of microbiomes interprets the linkages between microbial organization structure,metabolic functions and their meta-data like environmental conditions or healthy status.An accurate and reliable distance(or dissimilarity)among microbiomes is fundamental to deducing the beta-diversity of them.However,current methods of computing the structural distance between metagenomic samples,either ignore the evolutionary relationship among species or are disappointed to account for unclassified organisms that cannot be mapped to any definite tip nodes in the phylogenic tree,resulting in an erroneous beta-diversity pattern.In addition,to measure the microbiome distance from the functional aspect,current methods ignore the inherent relationships among functional gene families,thus produce erroneous distances.On the other hand,the throughput of most existing multi-dimensional scaling methods like Principal Co-ordinates Analysis(PCo A)are limited by computing efficiency,thus hindering the data mining on a much broader scale.To solve these problems,we propose the Dynamic Meta-Storms(DMS)algorithm to enable the comprehensive comparison of metagenomes on the species level with both taxonomy and phylogeny profiles.It compares the identified species of metagenomes with phylogeny,and then dynamically places the unclassified species to the virtual nodes of the phylogeny tree via their higher-level taxonomy information.In the meantime,Hierarchical MetaStorms(HMS)is proposed to comprehensively measure functional distances among microbiomes using multi-level metabolic hierarchy and the distance of every pathway level.Furthermore,we optimize Parallel-PCo A,a PCo A implementation by parallel computing,that can rapidly parse out the beta-diversity pattern for thousands of samples.DMS,HMS and Parallel-PCo A are coded in C/C++ using Open MP for parallelization and optimization,which is high speed and low memory comparison.DMS takes the specieslevel profiles of metagenomes as input,and generates their pairwise distance matrix.All tests are completed on a single non-shared computing node with 80 threads.DMS computes the pairwise distance matrix of 100 000 synthetic shotgun meta-genomes in 6.4 h,which is 20% faster than the benchmark method,yet saves over 40% memory usage.HMS takes microbiome functional profiles as input and accomplishes the pairwise distance matrix for 20 000 microbiomes in 73 minutes on the same computing node mentioned before that is 36 times faster than the benchmark methods,yet saved over 82% memory by a peak RAM usage.Then for the Parallel-PCo A,it takes the structure or functional distance matrix of samples(e.g.the result of DMS or HMS)and generates the PCo A coordinates in the lower(e.g.2 or 3)dimensional coordinate system to visually express the beta diversity between samples.It accomplishes the PCo A for 20 000 microbiomes in 161 minutes on the single computing node mentioned before,which is 17 times faster and 80% less RAM consumption compared to the existing methods.Thus DMS,HMS and Parallel-PCo A enable the in-depth data mining among microbiomes in a high resolution.
Keywords/Search Tags:Dynamic Meta-Storms(DMS), Hierarchical Meta-Storms(HMS), microbiome, bioinformatic algorithm, beta-diversity
PDF Full Text Request
Related items