Font Size: a A A

Verification The Independent Evolution Law And The Corre-Sponding Biological Functions Of 8-mers In Yeast Genome Sequences

Posted on:2018-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZheFull Text:PDF
GTID:1310330515455317Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The k-mer usage is non-random in a genome-wide scale and different kinds of k-mers have different biological functions.Exploiting the rules of k-mer usage and their biological functions plays important roles in genomic structure evolution and systematically understanding functional segments.K-mer spectra in more than 100 genomes were in two statuses:multimodality of tetrapods and unimodality of the other species.There is a debatable opinion on the cause of multimodal spectra.The research thought that different classes of function or structure elements resulted in multimodality.Another research pointed out that multimodal spectra were characterized by G+C content and CpG suppression,but the Hashim' suggestion was the two lowest models instead of CpG suppression.The reason for the different k-mer spectra remains to be further studied.In this paper,combining with the distribution characteristics of human k-mer spectra,we used the methods of statistical analysis and bioinformatics to study the distribution rules of k-mer spectra,discuss the independent evolutionary mechanism of CG subsets and propose and verify biological functions of three CG subsets in the yeast genome sequences.The main contents are as follows:Firstly,our calculation obtained the distribution of relative motif number with 8-mer frequencies?8-mer spectra?in human chromosome 1 and found that the 8-mer spectrum was trimodal.Only the spectra of CG dinucleotide classification formed independent unimodal distributions when the 8-mers were classified into three subsets under 16 dinucleotide classifications,called the independent evolution low of CG subsets.The distribution positions were stringent correspondences with three peaks of total 8-mers.It followed that the trimodal or unimodal spectra were decided by the degree of separation among the three CG spectra.Comparing with the random 8-mer spectrum,it was found that the CG0 spectrum located near the center of random distribution and their peaks of CG1 and CG2 were far away from the random center.Thus the 8-mers including CG dinucleotide were directional evolution and the 8-mers without CG dinucleotide abided by random evolution.The CG subsets had two distinct characteristics:?i?the most probable RMN values of CG1 and CG2 spectra were significantly larger than that of CG0;?2?the distribution width of CG2 and CG1 spectra was much narrower than that of CG0.This meant that the CG1 and CG2 8-mer usage was conservative.After analyzing sequence characteristics of three CG subsets,nucleosome core sequence?NCSs?and CpG islands?CGIs?,two theoretical conjectures were proposed:?1?CG1 motifs?8-mers including 1 CG?were the nucleosome binding motifs;?2?CG2 motifs?8-mers including 2 or more than 2 CG?were the modular units of CpG islands.Secondly,the 8-mer spectrum was unimodal in the yeast genome sequences.Our calculation obtained 8-mer spectra under 16 XY dinucleotide classifications in yeast and found that properties of three CG spectra agreed with that of human,which showed that the spectrum of yeast was a result of superposition because three CG spectra were in close proximity,and CG1 and CG2 8-mers were more conserved.From this the independent evolution phenomenon started from the lowest eukaryotes yeast.Because the numbers of CG2,CG1 and CG0 8-mers were too big,so m-mer?m=2,3,4?frequencies in subsets were used to represent the 8-mer sequence characters.The bias of m-mer usage in three CG subsets differed from one another,and the most obvious deviation?NSRE?existed in the CG1 subset during analyzing 16 kinds of XY1 8-mers.Thus this brought us to the conclusion that the CG dinucleotide was the heart of evolution from simple to complex.Thirdly,to verify whether CG1 8-mers were nucleosome binding motifs,motif information in three CG subsets was assigned to nucleosome core sequences and linker sequences for the binary classification assessment.The result showed a maximum of average AUC values resulted from CG1 information,which indicated that the information on CG1 motifs was more favoring NCSs than NLSs.Then motif information in CG1 8-mers was assigned to nucleosome core sequences for obtaining NSRE distributions.The distribution shapes were consistent with the published results,and optimal motifs might constitute the basic framework of NCSs and rare motifs decided the fine structure.If a standard octamer into the one-dimensional array along the double-stranded DNA,there was a one-to-one relationship between abundant CG1 signal regions in NSRE and histone positions.The two results confirmed the conjecture about CG1 8-mers.Fourthly,A breakdown of nucleosome positions with single-base-pair accuracy found that some NCSs lay in a squeezed state.According to the distances between neighboring dyads,NCSs were divided into four groups:usual NCSs?UN?;5'squeezed NCSs?5'SN?;3'squeezed NCSs?3'SN?;bilateral squeezed NCSs?SN?.Based on the conclusion that CG1 motifs were nucleosome binding motifs,an analysis of NSRE distribution features in four kinds of nucleosomes found the squeezed NCSs accompanied the changes of the sequence structures on the squeezed end and the un-squeezed end,and the sequence changing of squeezed nucleosomes was relevant with the strength of CG1 signals.All of the NLSs were classified into 11 groups according to their length.There were four types of conserved motifs,which were searched from length groups by MEME suite,meaning the diversity of linker sequences.Fifthly,to verify whether CG2 8-mers were the modular units of CGIs,motif information in CG2,CG1 and CG0 subsets was assigned to CGIs and non-CGIs for the binary classification assessment.The AUC values obtained were 0.95,0.80 and 0.02 respectively,which showed that CG2 information was in line with structural information on CGIs.After selecting the best cutoff value in ROC curves,the caculations of the total accuracy?ACC?and the Mathew's correlation coefficient?MCC?further confirmed that CGIs could be characterized by CG2 information.Thus the conjecture about biological functions of CG2 motifs was verified.
Keywords/Search Tags:the yeast genome, 8-mer spectra, CG motifs, independent evolution, biological function conjectures, functional validation
PDF Full Text Request
Related items