Font Size: a A A

Study On The Effectiveness Of Bacteria Based On Peptide Component Specific Classification

Posted on:2016-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:J K LiFull Text:PDF
GTID:2270330473962318Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The traditional way to infer phylogenetic relationship is mainly based on phenotypic characteristics, such as living environment, external morphology, physiology and metabolic pathway, etc., whereas all these become powerless for the classification of microorganism. With the development of molecular biology, especially for the constant improvement of sequencing technology, which makes it possible to infer phylogenetic relationship based on nucleotide or amino acid sequence. An outstanding breakthrough was made by Carl Worse and coworkers, who used the small subunit ribosomal RNA (SSU rRNA) to delineate the species and found a fact that the Archaea is a third distinct domain of the tree of life. However, this single gene alignment method is highly questioned by the instability of determination results. With the whole genome sequencing have been completed, several attempts have been made to infer phylogenetic relationship, such as the gene order, the gene content, the conserved gene pairs, etc., and gradually form so called PHYLOGENOMICS. Nevertheless, all these methods inherently rely on sequence alignments and are easily affected by factors such as genome size, horizontal gene transfer, parallel gene loss and gene evolution rate, etc., which lead to a poor specificity of the determination index and resolving power, so the methods cannot correctly infer the phylogenetic relationship of the sequenced species.Recently, Hao Bailin’s research group have proposed to use the n-mers in whole genome or proteome to infer the phylogenetic relationship. This alignment-free method can correctly determine the phylogenetic relationship of microorganism and effectively overcome many limitations of the original method, which arouses wide concern. The studies for the n-mers composition vector show that, only at appropriate string length we can correctly obtain the phylogenetic relationship for bacteria. For example, when n is equal to 5 or 6, composition vector of peptides can reconstruct the phylogenetic relationship tree; a poor result is discovered for short peptides.On the other hand, the GC-content is an important criterion of determining the phylogenetic relationship. Put simply, if the GC-content of the species is closer, the phylogenetic relationship is closer, and the GC-content has a certain relationship with genomic signature. Studies have found that the frequency distributional characteristics of tetra nucleotide and GC-content of corresponding DNA sequence have a strong correlation. However, the resolving power of the GC-content criterion is too low, which is only as an adjunct to other identification methods.This paper mainly studies on the relationship between composition vector of peptides and GC-content of corresponding DNA sequence for bacteria, revealing the superiority of the former, also provides a new annotation for the optimal string length.The results show that,1) when the peptides length is less than 4, there is a strong correlation between composition vectors of peptides and GC-content of corresponding DNA sequence, it means the information obtained from composition vector of peptides and GC-content is equivalent. Since previous studies have shown GC-content and composition vector of short peptides both have a resolution of above the genus level, which indicate that GC-content and composition vector of short peptides only can distinguish the specificity of species at this level. Owing to the equivalent with GC-content, composition vector of short peptides cannot infer phylogenetic relationship for bacteria at higher levels. This result also suggests that genomic signature cannot correctly reconstruct the phylogenetic relationship of microorganism for the reason of equivalent to GC-content;2)when the peptides length is greater than 4, the correlation between composition vector of peptides and GC-content show an abrupt change, i.e., tends to vanish quickly. When the peptides length is 5 or 6,the fact composition vector of peptides can correctly categorize the bacteria species show that, this new method do beyond GC-content and genomic signature and find the specificity of species.
Keywords/Search Tags:Phylogenomics, Alignment-free phylogeny, GC-content, Composition vector
PDF Full Text Request
Related items