Font Size: a A A

The Study For The Prediciton Of Protein Ubiquitination Sites Based On Deep Learning

Posted on:2022-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:D P LiuFull Text:PDF
GTID:2480306491454984Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein ubiquitination plays a key regulatory role in many life processes,including cell signal transduction,cell cycle regulation,DNA damage response,and cell immune response.Researchers have found changes of the protein ubiquitination in many genetic degenerative diseases which indicates that it is closely related to the occurrence of these diseases.it is significsignificance to reveal the regulatory mechanism to the diagnosis and treatment of these diseases and the development of related drugs.A large number of ubiquitation proteins and their modification sites have been identified by biological experiment.However,it is difficult to predict a large number of ubiquitination sites rapidly through biological experiment.Therefore,bioinformatics is introduced into the study.However,current prediction algorithms generally have problems with less sequence features,outdated classification algorithms,unreliable negative samples,and imbalance of positive and negative samples,which affects the improvement of prediction performance to some extent.Meanwhile,the large accumulation of data enables deep learning methods to apply effectively in this problem,but predictive programs based on deep learning methods are still rare,and many deep models with excellent performance have not been applied.For the above problem,we developed an algorithm using deep learning techniques to predict the ubiquitination sites based on extracting many features of protein sequence.Eight sequence features and five structural features are extracted of protein sequence in the algorithm.The the dimensionizes of the original feature vectors are reduced by feature selection.Based on this,we develop a semi-supervised deep learning-based framework to predict the protein ubiquitination sites.The framework is divided into three processing phases.First,we improve the reliability of negative samples by constructing an anomaly detection algorithm,GANomaly,based on a semi-supervised generative adversarial network model.Then,a new positive sample is constantly generated through a generative adversarial network model to enlarge the positive sample set and improve the training set imbalance problem.Finally,the protein ubiquitination modification site is identified by training a deep neural network classifier containing multiple convolutional and fully connected layers.Finally,we test the algorithm performance through a group of experiment.First,we construct four datasets with different sizes and species composition through collecting ubiquitination proteins and modification sites data from different species.Then,10-fold cross-validations are performed on the training sets.The result show that our algorithm outperforms existing algorithms in prediction performance.Next,tests are performed on the independent test sets.The results show that the performance of our algorithm is better than the present algorithms,while the unreliable test sets affect the range of performance improvement to some extent.Finally,we analyze the effectiveness of various strategies in the algorithm on the prediction performance improvement through a group of experiment.The results show that both extracting reliable negative sample sets and expanding positive sample sets promote the performance improvement,and the joint usage of the two strategies has more improvement.
Keywords/Search Tags:ubiquitation, site prediction, deep learning, semi-supervised learning, generative adversarial networks
PDF Full Text Request
Related items