Font Size: a A A

The Distributions Of CG 8-mer Subsets On The Regions Of Different Functional Sites In Six Species Genes

Posted on:2019-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2370330563456847Subject:Physics
Abstract/Summary:PDF Full Text Request
The k-mer spectrum of each genome sequence is determined,and the k-mer spectra of different genome sequences are different.Studying the intrinsic laws of the k-mer spectra can help us better to understand the structure of the genome sequence,the distribution characteristics of various k-mer and the biological functions reflected by the functional sequences.Previous work had studied the distribution regularities of the 8-mer spectra in different genomic sequences.In addition to very few species,the8-mer spectra exist the independent selection law on genomic sequences.That is,the three CG 8-mers subsets?0CG,1CG and 2CG?have evolved independently,and any DNA sequence is composed of these three CG 8-mers.Based on the independent selection law,the sequence sets around transcription start sites,transcription termination sites,translation initiation sites,translation termination sites,junction sites of intron and exon and junction sites of exon and intron sites were selected in H.sapiens,D.rerio,A.thaliana,O.sativa,A.gambiae and A.mellifera genes as the target sequences.For the six species,the distributions of the 8-mer spectra and the position distributions of the three CG 8-mer subsets in the regions of the six functional sites were analyzed,the differences of x-mer?x=3,4?usage in the three CG8-mer subsets in the regions of the six functional sites were studied.The aim is to explore the relationship between the distribution regularities of the three CG 8-mer subsets in different functional regions of a gene and species evolution.Firstly,the 8-mer spectra of the three CG 8-mer subsets in the six functional regions were given.It was found that the spectra of the three CG 8-mer subsets of H.sapiens,D.rerio,A.thaliana and O.sativa still comply with the independent selection law.The spectra of the three CG 8-mer subsets of A.gambiae and A.mellifera did not obey the independent selection law.Secondly,the relative distances from the most probable frequencies of the three CG 8-mer spectra of the six functional regions to the corresponding random center frequencies were calculated.The relative positions of the three spectra in H.sapiens,D.rerio,A.thaliana and O.sativa appear the separation phenomenon?RD0>RD1>RD2?,and the degree of the separations is correlated positively with species evolution.In other words,as the levels of the species evolution increase,the distance among the three spectral distributions increase gradually.The separation of the three spectra in A.gambiae is the exact opposite to the above four species?RD0<RD1<RD2?,and there is no obvious separation for the three spectra in A.mellifera.Finally,the relative standard deviation RS values of the three spectra were calculated.It was found that the RS values of the three CG 8-mer spectra in H.sapiens,D.rerio,A.thaliana and A.mellifera have obvious rules,namely RS0<RS1<RS2,it is consistent with that of the whole genomes.In other words,the conservation of the frequency usages of 1CG and 2CG 8-mer subsets are significantly higher than that of 0CG 8-mer subset.The conserved properties between 0CG and 1CG 8-mer subsets in O.sativa and A.gambiae still have the relationship RS0<RS1.However,the conservation of 2CG 8-mer subset in O.sativa around six functional sites is lower than 1CG.For A.gambiae,there are no significant differences in the conservation of the three CG 8-mer subsets.The distributions of the three CG 8-mers in six functional regions were explored.The results showed that the distributions of the three CG 8-mers are different in the six functional regions,and presented their own distribution characteristics respectively.The distributions are similar among vertebrates and among plants,and the distributions of A.gambiae and A.mellifera are more diverse.The distributions in the regions of the transcription start site and the translation initiation site have some similarities.The distributions in the region of the transcription termination site are similar to that of the translation termination site.There are symmetric properties for the distributions in the regions of the two start sites and the two stop sites and in the regions of the intron-exon and exon-intron junction sites.For H.sapiens,D.rerio,A.thaliana and O.sativa,the distributions of the three CG 8-mer subsets in the regions of the two start and two stop sites showed a regular change with the species evolution.However,six species exhibited similar distribution patterns around junction sites of exon and intron.For A.gambiae and A.mellifera,the distributions around the two start and two stop sites were significantly different from that of the other four species.The relative frequencies of x-mers?x=3,4?in three CG 8-mer subsets were used to characterize the CG 8-mer information and the new symmetric relative entropy were obtained by the relative frequencies of x-mers.The distributions of new symmetric relative entropy in the six functional regions of the six species were calculated.For H.sapiens,D.rerio,A.thaliana and O.sativa,the deviations of the new symmetric relative entropy of 2CG subset are the largest,followed by 1CG subset,and there is almost no deviation for 0CG subset.It showed that 1CG and 2CG8-mer subsets are the main signal motifs in the regions of functional sites,and 0CG8-mer subset is the background.A.gambiae and A.mellifera are special,their distributions show a clear preference around the six functional sites,and their distribution shapes are different from those of the other four species.A.mellifera has the greatest degree of preference and A.gambiae has the smallest degree of deviation.In short,the sequences of H.sapiens,D.rerio,A.thaliana and O.sativa genes around each functional site followed the independent selection law.The distributions of the three CG 8-mer subsets around the six functional sites are different,and the distribution patterns are closely related to species evolution.It was shown that the motifs containing CG denucleotide are the core motifs that make up various functional sequences.Their differences of the content and the distribution in functional regions determine the functional differences in different regions.A.gambiae and A.mellifera did not follow the independent selection law.The three CG8-mer subsets have their own special distribution rules around the functional sites.The independent selection law provides a novel ideas to study the structure of the sequence,and it is of great theoretical significance to explore the relationship between sequences and biological functions.
Keywords/Search Tags:functional region, 8-mer spectrum, CG 8-mer subsets, difference of distribution, species evolution
PDF Full Text Request
Related items