Font Size: a A A

Prediction Of Protein Structure Classes And Topology Analysis Of Protein Interaction Network Based On Support Vector Machine

Posted on:2019-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2370330575480666Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the great developments of new technology in protein and DNA sequencing,particularly the start of Human Genome Project(HGP)has made an exponential growth of proteins data.At the same time,with the rapid increase of the gap between the number of known protein sequences and the number of known proteins structures,it is badly needed to build a computational system for determining the structures and functions automatically.Artificial intelligence and pattern recognition methods used in protein structure and functions classificationis one of the most important research fields and challenges of bioinformatics.A large amount of published results indicates that machine learning and pattern recognition approaches have been particularly successful in protein structure prediction.Identifying protein structures,functions and other biologic information from the huge protein sequences by combining use computer science,mathematics and molecule biology theories should be the primary task and the most important study points until the new fast and high efficient laboratorial techniques have appeared.The main purpose of this paper is to predict the protein structural class and MicroRNA target based on the primary sequences by using supervised and semi-supervised methods.The protein structural class prediction is the most important part of protein structure predictions;it makes people grasping the fold situations of protein in the main should be realizable.Therefore,protein structural class indentification could provide useful imformation for its three-dimensional structure and function prediction.Levitt and Chothia proposed the four kinds of protein structural class: all-?(Proteins structured with ?helix;All-?(Proteins structured with ?fold);?/?(Proteins structured with ?fold and?helix by alternately arranged);?+?(Proteins structured with separate?fold and ? helix;generally the ?folds haveparallel structures).The main approaches for protein structural class prediction include pattern recongnition methods and experimentalmethods based on spectrum data in laboratory.Two works have completed on protein structural class prediction in this paper: SVM and an ensemble feature representtion methods were used in protein structural class preidiction(Chapter 2),and the simulation works on Protein-Protein Interaction nets.(1)Firstly,we randomly extracted three categories(?class,? class,? + ? class)protein sequence information data from RCSB pdb database,there have a total of 90 data after processing the Blast.In the next place,the four alternative model of Haffman coding,PseAAc,cambine Haffman coding with PseAAc,cambine character probability can be use to descript the information of Amino acid sequence.”one against one” and “one against rest” decomposition strategy can be use to the four alternative model train multiple classification support vector machine.The research results indicate that the classifier of the alternative model of Haffman coding has a low accuracy,which indicate the capacity of descripting class structure information is inferior.Moreover,using accuracy of the classifier of the alternative model of PseAAc is higher than Haffman coding;the model of Haffman coding can reflect the array specilality of coding to a certain degree,so that we can cambine the model of Haffman coding with PseAAc to replace model,the accuracy of classifier is inferior to PseAAc;Accuracy,the model of cambining character probability with PseAAc,is high.Then,the four kinds of replacement model encounter the issue of predictive bias in the decomposition strategy of “one against rest”.The forecast accuracy,the Positive is the protein structural classes of ? + ?,less than 30%.To improve the stability of model,so that we need weighting penalty parameter C to the class of Positive.The results manifest that it can effectively solve the problem of bias in the model of PseAAc and the group of character and PseAAc.Afterwards compared the classifier stability between the two decomposition strategy.we get the result that “one against rest” decomposition strategy has more outreach capacity after the bias adjustment.Finally concluded that the use of “one against rest” in the alternative model of the group of character probability and Pse AAc train the best classifiers.(2)Proteins are the basis of all life activities,but few proteins exist independently in nature.Proteins often achieve a certain function through complex interactions,physical contact,or chemical reactions.The study of the interactions between proteins is not only biologically significant,but with the development of the technological revolution,the interaction of proteins is considered as a complex system,and it also brings new problems to computer technology.The translation of protein interaction networks into complex networks is a common approach to the study of protein interaction networks.Firstly,this paper uses the concept of complex network to express the protein interaction network covered in the Giot2003 a data set in the DIP database,and calculates the main degree of centrality of the parameters based on the static protein interaction network,the centrality of the intervening number,the centrality of the subgraph,and the characteristic path.Aggregation coefficient and other network topology features.Then,based on the calculation results,it was found that protein interaction networks have power-law degree distribution,scale-free,and small-world characteristics.It is assumed that there may be a large number of tetrahedral structures in the protein interaction network,and a tetrahedral structure-based network model is proposed: the bottom is a hierarchical tetrahedral structure,and a short-cut is added from the substrate to form a group of tetrahedral complex network clusters Then by the simple C code to get edge tables with different probabilities and edges,write R software to get different networks,because adding short cuts is done from the top to the bottom,so the phenomenon of uneven distribution of nodes will be formed.The topological parameters of the bulk network clusters verify that the tetrahedral clusters are complex networks.Then the topological parameters of the clusters are compared with the topological parameters of the maximally connected subgraphs of the protein interaction network,and the topological parameters such as discovery degree,aggregation coefficient,and feature path are similar.It is concluded that the complex network clusters obtained from the tetrahedron model can simulate protein interaction networks.
Keywords/Search Tags:Machine learning, Protein Structure Prediction, Protein-Protein Interaction, Complex Network
PDF Full Text Request
Related items