Font Size: a A A

Proteins Lysine Modification: Database Construction And Bioinformatics Prediction

Posted on:2020-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:H D XuFull Text:PDF
GTID:1360330590959048Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Protein lysine modification is a very important type of modification in vivo.It occurs after the protein translation is completed,and it is a process of covalently binding different small molecular groups or small proteins to the specific lysine residues of the substrate protein.By affecting the structure,activity,and localization of proteins,protein lysine post-translational modifications are involved in a variety of biological processes,such as protein degradation,cell differentiation,gene expression,DNA replication and damage repair,in vivo metabolism,and autophagy.Abnormal states of protein lysine modification are often closely related to diseases such as cancer.In recent years,with the rapid development of lysine modification specific antibodies and mass spectrometry techniques,a large number of protein lysine modification data have been generated.But how to collect,integrate and analyze these data,and extract valuable information from it to provide a useful reference for experimental research is an urgent problem to be solved in this field.Therefore,this work conducted a systematic bioinformatics study on protein lysine modification.We first constructed a comprehensive database of protein lysine modifications,PLMD 3.0.Through literature search and database integration,we collected a total of 284,780 experimentally validated lysine modification sites on 53,501 proteins in 176 species,involving 20 different lysine modifications,including nine acylation modifications,four ubiquitin and ubiquitin-like modifications and seven other types of lysine modifications.Based on the PLMD data set,we found that 16 lysine modifications have significant motif.In addition,we found 65,297 lysine modification sites in situ crosstalk,indicating that different lysine modifications in situ crosstalk events are significant occured.To date,the PLMD serve as the database of protein lysine modification containing the most common types of modifications,species,proteins,and modification sites.In addition,abnormal protein lysine modification status is closely related to the occurrence and progression of various diseases.In order to gain a deeper understanding of the regulatory mechanism of protein lysine modification in biological processes and its relationship with disease,we further constructed a database of protein post-translational modification information related to human diseases,PTMD 1.0.In addition to a variety of lysine modification-disease association information,some other protein post-translational modifications(PTMs)and disease annotation information are also included in the PTMD database.By manually searching the literature,we collected 1,950 PTM-disease associations(PDAs).These PDAs are located on 749 proteins and cover 23 PTMs and 275 disease types.According to the influence of PTMs status on diseases,all known PDAs are classified into six categories.The results indicate that the up-regulation of PTMs status and the emergence of PTMs are more closely related to diseases,and in the development of complex diseases,multiple PTMs may be involved to interfere with each other.By constructing a disease-gene network,we found that breast cancer is most closely related to changes in PTM status.At the PTM substrate level,abnormalities in PTM status on important protein kinase AKT1 substrates are most relevant to disease.The PTMD database has very detailed annotations that can be a useful resource for further analysis of the relationship between PTMs and human disease.The computational model can be trained based on high quality data sets in the database,providing an alternative method for identifying potential lysine modification sites on proteins.In this work,we developed the HybridSucc,a new lysine succinylation(Ksucc)site prediction tool based on a hybrid learning framework.By integrating database,such as PLMD 3.0 and literature searches,we collected 26,243 experimentally validated Ksucc sites on 8,830 proteins in 13 organisms.Based on three traditional machine learning algorithms,including penalty logistic regression(PLR),support vector machine(SVM)and random forest(RF),the predictive capability of seven protein sequence features and three structural features was systematically evaluated.The results show that all ten features are informative.We also implemented the Deep Neural Network(DNN)framework and examined ten features.We found that deep learning and traditional machine learning algorithms show distinct advantages in different features.Then,by combining DNN and PLR,we developed the HybridSucc,a Ksucc prediction tool based on a hybrid learning framework.Also,HybridSucc is significantly better than other existing succinylation prediction tools.Using HybridSucc,we screened for potential functional Ksucc sites in the whole proteome and screened 5,251 known and 3,615 potentially potential functional Ksucc sites.Moreover,we also mapped cancer mutations in The Cancer Genome Atlas(TCGA)to human Ksucc substrates,defined Ksucc-related mutations(KsuMs),and developed a new statistical approach of the gradual distribution of probability density(GDPD)to estimate the impact of cancer mutations on Ksucc sites.We identified 370 highly potential KsuMs in 218 genes,including a number of well-studied genes involved in tumorigenesis such as pyruvate kinase M2(PKM2),serine hydroxymethyltransferase SHMT2 and isocitrate dehydrogenase 2(IDH2).In summary,this work mainly focuses on the protein lysine modification and its relationship with disease.First,we collected and integrated multiple types of lysine modification sites in different species to construct a comprehensive database of protein lysine modifications.At the same time,in order to further understand the regulation mechanism of lysine modification in biological processes and its relationship with disease,we further constructed a database of post-translational modification information related to human diseases.Based on the high-quality datasets in the database,through the integration of deep learning and traditional machine learning,this work also developed a new Ksucc prediction tool based on the hybrid learning framework.This work provides a new strategy for the study of protein lysine modification site in terms of identification,molecular mechanism and regulation.
Keywords/Search Tags:Protein lysine modification, Succinylation, Human disease, Deep learning, Bioinformatics, Database
PDF Full Text Request
Related items