Font Size: a A A

Patterns Of Nucleotides Flanking Substitutions And Relative Entropy Periodicity

Posted on:2011-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:L MaFull Text:PDF
GTID:1100330332485451Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Substitution is the ultimate source of variation and novelty in evolution. The influence of context on substitution should be reflected in the distribution of sequences surrounding substitution sites. Three-base periodicity is an intrinsic property of coding sequences. However, how three-base periodicity is influenced in the vicinity of substitutions is still unclear, and the disagreement still exists in the origin of three-base periodicity. We have inferred 980,930 substitutions in 66,105 genes from 18 species, and analyzed the pattern of neighboring-nucleotides of these substitutions. We also discussed factors responsible for three-base periodicity. Our investigation of the influence of substitution and its context on three-base periodicity in genes may be of great interest in analyzing human genetic variation, as well as in designing gene predicting tools based on the principle of three-base periodicity.Using relative entropy (also known as Kullback-Leibler divergence), we visualized the periodic patterns in human orthologous genes, finding three-base periodicity peaked at the third-codon positions (3n) and bottomed at the first (3n-2) or second (3n-1) codon positions. The periodic signals were interrupted near the substitution sites and they would then reappear away from the substitution sites, indicating the periodicity was greatly affected in the close vicinity of a substitution. The highest entropy value was often located at sites -1 or +1.To investigate three-base periodicity, we fitted a sine model to the values of the relative entropy. The wavelength, amplitude, peak location and trough for the three-base periodicity were determined using the sine model. Results indicated that a sine of period equal to 3 is a good approximation for three-base periodicity at sites not in close vicinity to some substitutions. The usage biases of the flanking nucleotides extend no farther than two nucleotides from the substitution sites.According to the neighboring-effect on substitutions, closely related species were more similar than were distantly related species. Neighboring patterns differed both among substitution categories and within a category that occurred at three codon positions.We discuss factors responsible for three-base periodicity, generated with the aid of two control sets in which codons were shuffled in two different manners. The codon-shuffled sequence kept a strict codon usage frequency in the corresponding native sequence. Thus, if periodic signals of the native flanking sequences were to be determined by their codon usage frequencies alone, codon shuffling would not lead to changes in the periodic signals. However, changes clearly appeared, suggesting that the codon usage frequency was not the sole origin of the three-base periodicity. Codon order determines the differences between the native and codon-shuffled datasets, implying that the native order of codons also played an important role in this periodicity. Synonymous codon shuffling introduced associations of codon degeneracy with three-base periodicity, as well as revealed that synonymous codon usage bias was one of the factors responsible for the observed three-base periodicity.The relative frequencies of CG→TG and CG→CA were greater than 20% in genes, however, they were lower than 5.5% in CpG islands. These results implying that the pattern of substitutions in genes is dominated by the hypermutability of the CpG. Nevertheless, the effect of the CpG was suppressed in CpG islands. GC content in CpG islands was closed to equilibrium. The relative entropies did not reveal the periodic signals in CpG islands.Our results offer an efficient way to illustrate unusual periodic patterns in the context of substitutions and provide further insight into the origin of three-base periodicity. This periodicity is a result of the native codon order in the reading frame. The length of the period equal to 3 is caused by the usage bias of nucleotides in synonymous codons.
Keywords/Search Tags:substitution, sequence context, three-base periodicity, relative entropy, sine model
PDF Full Text Request
Related items