Font Size: a A A

A Novel 3D Graphical Representation Based On Chaos Game Representation And Its Application

Posted on:2018-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:C R XuFull Text:PDF
GTID:2310330512486510Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the start of Human proteome project(HPP)and post-genome era,massive protein sequences data have been generated and collected.Managing these data using experimental methods is unstable,expensive and time-consuming.More and more researchers have managed numerous protein sequences using mathematics and computer science based on theory that sequences determine structure and function.They extract underlying information of structure and function from protein sequences to support and direct biological and medical experiments.Sequences data management can be used in a wide range of areas related with human health,including drug design and disease diagnosis.Compared with DNA and RNA,protein sequences analysis is more difficult,because component and function of protein is relatively complicated and diverse.Existing sequence-based tools usually have some limitations,such as lack of biological meaning,weak visuality,high time complexity,and unsatisfied accuracy.Therefore,we proposed a novel 3D protein graphical representation with low time complexity based on biological knowledge,statistical theory and information science.Then,we applied our approach to protein similarity/dissimilarity analysis and functional protein prediction.Our main work is as follows:1.We proposed an inverse CGR for condons,which can cluster synonymous codons,and combined this with important physicochemical properties of amino acids to reflect protein sequences to 3-dimensional curves.Then,we modified 2D moment vector to extract feature vector from 3-dimensional curves,which can highly reduce time complexity through avoiding considering different lengths of sequences.2.We applied our new graphical representation to three typical protein sequences datasets to analyze protein similarity/dissimilarity,and got desirable result which is in accord with actual evolutionary relationship.3.In order to verify our method further in other applications,we combined 3D moment vector with amino acids composition and CMV.Then input final vector in SVM to predict anticancer peptides,bacterial adhesion and eukaryotic neurotoxic protein.Based on 5-fold cross validation,we got accuracy 96%and 97.73%on main and alternative dataset,respectively;accuracy 88.82%and 86.11%on balanced 1 and balanced 2 dataset,respectively.These results are better than Tyagi's.Compared with methods in references,our model performed better on bacterial adhesion and eukaryotic neurotoxic protein as well,reaching at 92.75%and 98.00%accuracy,respectively.To sum up,our 3D protein graphical representation performed well on a wide range of protein sequences datasets,which is biologically meaningful,time-saving and effective.
Keywords/Search Tags:Chaos Game Representation, protein similarity, Support Vector Machine, anticancer peptides
PDF Full Text Request
Related items