Font Size: a A A

Multi-Scale Encoding Of Amino Acid Sequences For Predicting Protein Interactions Using Gradient Boosting Decision Tree

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2370330593451070Subject:Computer Technology and Engineering
Abstract/Summary:PDF Full Text Request
Proteins are crucial for almost all of functions in the cell and Protein-protein interactions(PPIs)play a key role in various biological functions such as DNA transcription,metabolic cycles and signaling cascades in cells.Therefore,identification of PPIs can provide a great insight into protein functions and further biological processes.With the development of proteomics,many experimental techniques have been developed,the experimental methods are costly and time consuming.Hence,a number of computational methods have been proposed for the prediction of PPIs,However,the application of most existing methods is limited because they require information about the interaction marks of protein partner and protein homology.In this paper,we present a computational approach for predicting PPIs by combining a multi-scale encoding representation of proteins and a gradient boosting decision tree classifier,our work are as follows:(1)physicochemical characteristics,including their qualitative and quantitative attributes,of amino acids are used to encode a protein sequence at multi-scale.Five kinds of protein descriptors,frequency,composition,transformation,distribution and auto covariance,are extracted from these encodings for representing each protein sequence.A 347 dimensional vector of a protein sample is obtained after the transformation.The multi-scale encoding scheme is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence.(2)Based on such a feature representation,the gradient boosting decision tree algorithm is introduced to predict protein interaction class.When the proposed method is tested with the PPI data of S.cerevisiae,it achieves a prediction accuracy of 95.28%at the Matthew's correlation coefficient of 90.68%.Compared with the state-of-the-art works on H.pylori and Human,the accuracies can be raised to 89.27%and 98.00%respectively.Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising.Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme,the proposed method might be a useful tool for future proteomics studies.
Keywords/Search Tags:Protein, Protein-protein interaction, Multi-scale, Gradient boosting decision tree
PDF Full Text Request
Related items