Font Size: a A A

The Statistical Relationship Between MRNA Sequence, Structure, Energy And Protein Secondary Structure

Posted on:2005-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:M W JiaFull Text:PDF
GTID:1100360125952794Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Apart from information stored in amino acid sequence, mRNA sequence brings together additional information such as codon bias, and its secondary structure. In this paper, we will study the relationship between mRNA sequence and structure and corresponding coding protein secondary structure. This paper is divided into four parts.In the first part, for the need of statistical analysis of the relationship between mRNA and protein secondary structure, a new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL) is constructed. IADE included two subsets, one called IADE1 which including amino acid sequence and corresponding protein secondary structure data, another called IADE2 which incorporates matching mRNA sequence, amino acid sequence and protein secondary structural data. IADE1 and IADE2 include 2269 and 648 protein domains respectively.In the second part, based on IADE database we study the relation between mRNA stem/loop frequencies and protein secondary structure. The mRNA secondary structure was folded by use of RNAstructure 3.6. Statistical analysis revealed that the alpha helices and beta strands on proteins tend to be preferably coded by mRNA stem region, while the coil on proteins tend to be preferably coded by mRNA loop region. To obtain a better statistics, a structural word is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha helix or beta strand) propensity. These tendencies are more obvious if we observe the structural words (SWs). Statistical significance analysis shows that the deduced correlations between protein and mRNA stem/loop structure are significant and can hardly be explained as the stochastic fluctuation effect.As the complement, we also analysis the relationship between mRNA stem/loop content and protein secondary structure, the result shows that regular structure tend to be coded by the mRNA segments with high stem content, while the coli structure tend to be coded by low stem content segments.In the third part, from statistical analysis of protein sequences for human and E.coli we have found that the messenger RNA segment of/w-codons (for m=2 to 6) with averagely high tRNA copy number (TCN) (larger than -10.5 for human or ~1.95 for E.coli) preferably code for alpha helix and that with low TCN (smaller than -7.5 for human or -1.7 for E.coli} preferably code for coil. Between them there is an intermediate region without correlation to structure-preference. For beta strand the preference/avoidance tendency is not obvious. All strong preference-modes of TCN for protein secondary structures have been deduced. Statistical significance analysis by use of Tukey test shows that the mutual interaction between protein secondary structural type and codon TCN is very significant. Based on the idea of the dependence of translational efficiency / accuracy on codon usage, a phenomenological model is proposed. Using the model we have a primary analysis of the relationship between the protein structure-preference and codon TCN.In the last part, we have studied the relationship between protein secondary structure and mRNA folding energy by using 107 mRNA sequences of E. coli and 125 mRNA sequences of human. Because the codon usage and its influence to gene expression are quite different from species to species, we analyze the relation between protein secondary structure and mRNA folding energy for human and E.coli respectively. The mRNA folding energy and secondary structure for each native sequence and corresponding random sequence is calculated by use of RNAfold program. The Z score was computed as the difference between the folding free energy of native sequence and the mean of corresponding random sequences. According to the Z score we have found that the folding free energy of native sequence is averagely lower than that of randomized sequences by 1.69 standard deviation (Z score = -1.69) for E coli and by 2.26 for human (Z score = -2.26). On the other hand, the mRNA folding free energy or corresponding Z score decreases...
Keywords/Search Tags:Sequence-structure database, protein secondary structure, stem/loop structure, tRNA copy number, codon preference, mRNA folding free energy, mRNA native sequence, random sequence, Z-score, statistical correlation
PDF Full Text Request
Related items