Font Size: a A A

An Ensemble Algorithm To Predict Coenzyme A-associated Proteins Using Sequence And Network Information

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:B L FanFull Text:PDF
GTID:2370330611983357Subject:Agricultural Information Engineering
Abstract/Summary:PDF Full Text Request
Coenzyme A(Co A)plays a crucial role in a variety of cellular functions and metabolic pathways.Accurate recognition of Co A-associated proteins(CAPs)is helpful for deep understanding of important biological processes such as acyl transfer,carboxylic acid metabolism and tricarboxylic acid cycle.It is time-consuming and laborious to detect the interaction between Co A and proteins by experimental methods,the development of calculation methods can make up for the shortcomings of experimental techniques.Although a few computational studies have analyzed CAPs from different perspectives,few studies have systematically characterized these proteins at the sequence,evolution,structure,and network levels and tried to synthesize this information to develop theoretical prediction models.This research will be the first to explore in this area.In the study,we proposed an integrated algorithm based on sequence and network information to predict the CAPs.Firstly,our algorithm combined machine learning method and template method to develop a prediction model for identifying Co A binding residues,and used the predicted binding residue distribution characteristics to further predict related proteins.Experimental results show that CAPs are more likely to physically bind to Co A or its derivatives than other proteins.Then,using sequence information and network information,other six sub-classifiers are designed,which are word vector,long-distance template number,evolutionary conservation,amino acid composition,predicted structural and network features classifier.Through comparative analysis,it was found that CAPs has more distant homologues,an older protein age,a more ordered and hydrophobic molecular conformation and denser in the protein interaction network than NCAPs.Then,the above seven sub-classifiers were evaluated on the training sets of human,mouse and Arabidopsis,and it was found that all classifiers can be used to predict CAPs,and all AUCs are greater than 0.7.In order to combine information from different sources,a two-layer stacking integrated algorithm was constructed based on the output probability of the sub-classifier.The AUCs obtained by the human,mouse and Arabidopsis on the training sets were 0.990,0.985,and 0.981,and on the testing sets were 0.965,0.969,and 0.968.These results indicate that the information of more aspects is helpful to identify CAPs more accurately.Finally,all proteins of the three species that have been reviewed in the Uni Prot database were tested independently.GO and KEGG analysis were performed on 165 human proteins,206 mouse proteins,and 231 Arabidopsis proteins which with high predictive scores.It was found that these proteins were mainly enriched in mitochondria and chloroplast components,various amino acid metabolic pathways and tricarboxylic acid cycle pathways.These observations are consistent with the function of CAPs,which proves the validity of the model and help to further understand the interaction mechanism of Co A and related proteins.
Keywords/Search Tags:Coenzyme A and its derivatives, sequence and network information, binding sites, associating proteins, ensemble learning
PDF Full Text Request
Related items