Font Size: a A A

Artificial Intelligence Biology Study On Prediction Of Protein Post-translational Modifications And Functions

Posted on:2022-03-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W S NingFull Text:PDF
GTID:1480306572475694Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Protein is an important molecule that regulates life activities,and post-translational modification(abbreviation: modification)is an important regulatory mechanism for protein function.Modifications such as phosphorylation,succinylation and S-palmitoylation of specific amino acid residues participate in almost all biological processes by dynamically changing the conformation,activity and subcellular localization of proteins.The abnormality of protein modification and function is closely related to the occurrence and development of human diseases.Therefore,the systematic integration of the biomedical big data related to protein modification and function,and using cutting-edge artificial intelligence technology represented by deep learning algorithms to design,optimize and perfect the calculation method of modified substrate,site and function prediction,which can provide important reference information for further experimental research and clinical practice.In this article,for the prediction of succinylation sites,we integrated 7 sequence features including pseudo-amino acid composition,composition of k-spaced amino acid pairs,orthogonal binary coding,amino acid index,autocorrelation functions,group-based prediction system,and position-specific scoring matrixs,as well as 3 structural features including accessible surface area,secondary structure and backbone torsion angles and combined with deep neural network algorithm and traditional machine learning algorithm penalty logistic regression to design a novel hybrid-learning framework to build the calculation tool Hybrid Succ,which achieved area under curve(AUC)values of 0.885 for general prediction of Ksucc sites,respectively.In comparison,the accuracy of Hybrid Succ was 17.84% to 50.62% better than that of other existing tools.Combined with Hybrid Succ,we also designed a gradual distribution of probability density(GDPD)statistical method to predict 370 cancer mutations that potentially affect succinylation.For the prediction of S-palmitoylation sites,we designed the Graphic Presentation System(GPS)algorithm to built the GPS-Palm software,and proposed two new strategies of Number-to-Image Transformation(NIT)and Data Quality Discrimination(DQD)to convert the site data into image data after quality control.Then,we implemented the parallel convolutional neural network(p CNN)framework and the accuracy was 31.3% higher than the existing algorithm(0.855 vs.0.651).We also developed GPS-PBS software using transfer learning technology,which can accurately predict phosphorylation sites that specifically interact with phosphoprotein-binding domains.Recently,we have integrated the knowledge from public databases that cover multiple aspects,such as genome,transcriptome,proteome,epigenome,and drug-target,and systematically carried out the functional analysis of liquid-liquid phase separation(LLPS)proteins.We speculated that important mutations participate in human diseases by changing the LLPS of proteins.During the epidemic,we and our collaborators jointly carried out quantitative proteomic analysis of plasma samples from patients with COVID-19.Based on machine learning,we design the Prioritization of Optimal biomarker Combinations for COVID-19(POC-19)algorithm to predict and verify 11 new biomarkers.In addition,we collected,integrated,and annotated the chest CT images and clinical diagnosis data of 1,521 patients with COVID-19,and constructed a comprehensive database i CTCF.Based on this data,we designed Hybrid-learning for Unbia Sed predic Tion of COVID-19(HUST-19)artificial intelligence diagnosis software,which realized the efficient integration of CT image and clinical feature.The system can not only accurately determine whether a patient is suffering from COVID-19 pneumonia,but also accurately predict the severity of the disease and the potential risk of death.From the clinical data of COVID-19,we predicted a number of protein markers that are significantly related to patient survival,providing important clues for COVID-19 diagnosis,pathological mechanism exploration and drug target discovery.In summary,we integrated different types of complex data such as sequence,structure,omics,imaging,and clinical features and design a series of new artificial intelligence framework.We initially realized accurate predictions of protein modification and function and provided important calculation tools and reference information for further experimental research.
Keywords/Search Tags:Protein post-translational modification, Protein function, Artificial intelligence biology, Deep learning, Machine learning, Protein biomarker
PDF Full Text Request
Related items