Font Size: a A A

Identification Of Protein Post-Translational Modification Site And Its Association With Disease

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q X HuangFull Text:PDF
GTID:2404330611995991Subject:Pharmaceutical
Abstract/Summary:PDF Full Text Request
By covalent attachment of new functional groups to polypeptide chains,post-translational modifications(PTMs)of a protein can determine its activity state,localization and interactions with other proteins,which greatly expand the proteome diversity.PTMs are at the heart of many cellular signaling events,and have direct connections with many diseases.Despite the great importance of PTMs for biological functions,their study on a large scale has been hampered by a lack of suitable methods.Experimental methods are time-consuming and labour-intensive,which are stressful in handling the protein primary structure data with a rapid growth.Therefore,developing computational approaches to identify PTMs site and study its association with diseases is urgent.Based on random forest,this study proposed a series of methods to study human PTMs sites and their associations with diseases.The main contents are as follows:1.Developed HydLoc,a tool for human hydroxylation sites prediction,based on random forest using protein sequential information and physicochemical properties of amino acids.The result of leave-one-out cross-validation on training dataset showed that the accuracy for humanhydroxyproline and hydroxylysine prediction reached 84.25% and 80.61%respectively.Independent dataset test showed that the prediction accuracy for residue P and K were 90.74% and 81.25%,respectively,which outperforms the existing method.2.Proposed a method for human phosphotyrosine sites identification based on random forest by selecting more reliable negative samples.The result of leave-one-out cross-validation showed that the prediction sensitivity was80.35%,which was 15.34% higher than the model trained with randomly selected negative samples.3.Proposed a method,termed RFEW,for diseased-associated phosphorylation sites identification based on random forest coupling a random walk in phosphorylation site-disease bilayer network.The leave-one-out cross-validation showed that the AUC for identifying phosphorylation site associated with Breast Neoplasms,Alzheimer Disease,Ovarian Neoplasms achieved 0.9285,0.9789 and 0.9021,respectively,which was better than the existing method.
Keywords/Search Tags:Post-translational modification, Protein, Random forest, Network, Disease
PDF Full Text Request
Related items