Font Size: a A A

On The Designability Of Protein Structure, Prediction Of Structural Classes Of Proteins And Human PolⅡ Promoter

Posted on:2009-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YangFull Text:PDF
GTID:2120360245990290Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
This thesis includes three main parts as follows:Firstly, we study the designability of protein structure. Based on random sam-plings of protein structure space and common biased sampling of sequence space, us-ing six kinds of lattice types (4×4 , 5×5 , 6×6 square lattices, 3×3×3 cubiclattice and 2 + 3 + 4 + 3 + 2, 4 + 5 + 6 + 5 + 4 triangular lattices), three differentalphabet sizes (HP, HNUP, and 20 letters) and two energy functions, the designabilityof protein structures is calculated. Then three quantities which are defined to eluci-date the designability are calculated. They are stability, foldability and partnum of thestructure. We find that whatever the type of lattice, alphabet size and energy functionused, there will be the emergence of highly designable structure. For all cases consid-ered, the local interactions reduce the degeneracy rate of protein sequences and makethe designability higher. The designability is sensitive to the lattice type, alphabet sizeand energy function. The correlation coefficients between the designability, stabilityand foldability are mostly larger than 0.5, which demonstrates that they have stronglinear correlation relationship. But the linear correlation relationship between the des-ignability and partnum is not so strong because partnum is independent to energy [1].Secondly, we discuss the clustering analysis for four protein structure classes (i.e.,α,β,α+βandα/β). We propose to use Schneider and Wrede hydrophobicity scale ofamino acids and the 6-letter model of protein to study the secondary structural classesclassification of large proteins. Two kinds of multifractal analyses are performed on thetwo measures obtained from these two kinds of data on large proteins. Nine parametersare achieved from the calculation, which are considered to construct some parameterspaces. Each protein is represented by one point in these spaces. A procedure isproposed to separate large proteins from theα,β,α+βandα/βstructural classesin these parameter spaces. Fisher's linear discriminant algorithm is used to assessour clustering accuracy on the 49 selected large proteins. Numerical results indicatethat the discriminant accuracies are relatively high. In particular, they reach 100.00%and 84.21% in separating theαproteins from the {β,α+β,α/β} proteins in a 3-dimensional (3D) parameter space; 92.86% and 86.96% in separating theβproteinsfrom the {α+β,α/β} proteins in another 3D parameter space; 91.67% and 83.33%in separating theα/βproteins from theα+βproteins in the last 3D parameter space[2].At last, we predict human Pol II promoter, which is mainly about the distinguish- ing between promoter and non-promoter sequences. Here, the non-promoter sequencesare made up of exon and intron sequences. Four methods are used: two kinds of mul-tifractal analyses performed on the sequences obtained from the dinucleotide free en-ergy, Z curve analysis and global descriptor of the promoter/non-promoter primarysequences. A total of 141 parameters are extracted from these methods and cate-gorized into seven groups. They are used to generate certain spaces and then eachpromoter/non-promoter sequence is represented by a point in the corresponding space.Based on Fisher's linear discriminant algorithm, after testing all the 120 possible com-binations of the seven groups, with a relatively smaller number of parameters (96 and117), we get satisfactory discriminant accuracies. Particularly, in the case of 117 pa-rameters, the accuracies for the training and test sets reach 90.43% and 89.79%, re-spectively. A comparison with five other existing methods indicates that our methodshave a better performance. Using the global descriptor method (36 parameters), 17of the 18 experimentally verified promoter sequences of human chromosome 22 arecorrectly identified [3].
Keywords/Search Tags:Designability, protein structural classes, multifractal analysis, Z curve, global descriptor, promoter/non-promoter
PDF Full Text Request
Related items