Font Size: a A A

Research On Computational Methods Of Protein Liquid–liquid Phase Separation

Posted on:2024-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S M ZhouFull Text:PDF
GTID:2530307295459704Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Liquid-liquid phase separation is a phenomenon of biomolecules in life activities and a universal mechanism of space-time coordination of intracellular biological activities.The liquid-liquid separation of proteins is involved in a wide range of biological regulation,and is closely related to metabolic diseases,Alzheimer’s disease,cancer and other diseases.In this thesis,traditional machine learning and deep learning related algorithms are combined with protein sequence information,structural information and evolutionary information to build an effective prediction model,and carry out research on liquid-liquid phase separation of proteins.The main research contents are as follows:(1)According to the experimental conditions of whether the concentration of liquid-liquid isolated proteins in LLPSDB database is less than 100μM,the proteins that occur liquid-liquid separation are divided into two types of data: strong liquid-liquid separation and weak liquid-liquid separation.Then,seven sequence features,four evolutionary information features and three embedding features were combined with five traditional machine learning methods to construct prediction model,and four evolutionary information features were combined with two deep learning methods to construct prediction model,respectively.On the10-fold cross-validation,the AUC of the deep learning model based on evolutionary information features reached 0.79,0.97 and 0.95 in the three sets of binary classification data.Compared with the existing model Dee Phase,our model improves the AUC of the three sets of data by 0.03,0.01 and 0.05,respectively.Although the AUC of the model in distinguishing LLPS+ and PDB* data sets is only increased by 0.01,its recall rate and precision are 0.04 and0.10 higher than those of Dee Phase,respectively.(2)According to whether liquid-liquid phase separation proteins need to cooperate with other biomacromolecules to undergo liquid-liquid phase separation,this thesis collected 128 proteins that spontaneously undergo liquid-liquid phase separation and 214 proteins that cooperatively undergo liquid-liquid phase separation protein.A traditional machine learning model based on embedding features and a deep learning model based on evolutionary information features were constructed to predict these two types of liquid-liquid phase separation proteins.For two sets of datasets,the deep learning model based on evolutionary information features showed good performance,with AUCs of 0.96 and 0.93,respectively.Further,comparing our model with various existing models on six independent test data,our model shows superior performance.(3)Comprehensive and accurate collection of data is the key to building prediction models.Based on Pha Sep DB V2.1,LLPSDB V2.0 and Pha Se Pro,the liquid-liquid phase isolated protein dataset,spontaneous liquid-liquid phase isolated protein dataset and partner collaborative generation liquid-liquid phase isolated protein dataset were collected.According to the comprehensive data,the CNN deep learning model Pred LLPS_PSSM based on the characteristics of evolutionary information is constructed.We also collected a group of newly discovered liquid-liquid protein isolates from the insects.Pred LLPS_PSSM performed better than existing advanced prediction methods.Further,characteristic analysis was carried out by comparing positive and negative samples and Shapley Additive ex Planation algorithm is adopted to explain the reasons for the excellent performance of our model.Finally,the Pred LLPS_PSSM online prediction platform http://47.92.65.100/zsm/ is developed for the convenience of researchers.
Keywords/Search Tags:Liquid-liquid Phase Separation, Sequence Information, PSSM, Evolutionary Information, Deep Learning
PDF Full Text Request
Related items