Font Size: a A A

Generalized CGR Representation Of Protein Sequences And Its Application

Posted on:2017-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:S N XuFull Text:PDF
GTID:2180330482980621Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Chaos research is applied to many subjects, such as mathematics, physics, biology,meteorology, engineering and economics. Some scholars have introduced the chaos theory into analysis of DNA sequences, which is named chaos game representation(CGR). In these researches, substrings of the DNA can be mapped into points of space based on an iterative function. By the method of visualization, many interesting analysis results of the structure of the genome are found in the field.In this paper, the CGR graphical representation of DNA is generalized to the analysis of protein sequences. Based on the classification and a new iteration function, a CGR graphical representation of protein is introduced. Furthermore, a new numerical characterization is proposed to compare the similarity of protein sequences. The usefulness of this approach can be illustrated by performing the comparison of sequences of sixteen ND5 proteins sequences,twenty nine spike proteins, as well as ND6 proteins of eight species. Based on the comparison results, we construct their phylogenetic tree, in which consistent with the evolution of species in biology. By the correlation analysis, ClustalW results are compared with our results and some other graphical representation results to demonstrate the effectiveness of our approach.Moreover, the generalized CGR graphic representation of proteins is applied to protein family analysis. The generalized CGR graphic representation of eight protein families from four kinds of protein structure class,All α,All β,α/β,α+β can be constructed based on above the graphical representation. By the statistical approach, the frequencies of points in 8 sub-regions within the cube are regarded as their numerical characteristics. The result shows that our method not only can effectively distinguish different protein structures, but also can clearly recognized the structural pattern of each protein family.The different parameters of iteration function are selected to reflect the effect on the CGR graphical representation of protein. TBP-like protein family are used to an example to demonstrate that the parameter α is smaller, fractal images is more separated.
Keywords/Search Tags:3D-CGR, iterated function system, fractal structure, protein sequences, similarity
PDF Full Text Request
Related items