The Research On The Prediction Method Of Protein Succinvlation Sites Based On PU Learning And Deep Learning Technology

Posted on:2023-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhang

Full Text:PDF

GTID:2530306614972599

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the progress of the post-genome project and the vigorous development of high-throughput biological sequencing techniques,biological data continues to grow explosively.Biological computing has penetrated into all fields of biology.Taking the succinylation of proteins as an example,determine which lysine residues in an unknown protein sequence are succinylated.The use of traditional methods to solve this problem is mainly through the method of mass spectrometry,This approach would require an inordinate amount of time and a huge amount of human and financial resources.Therefore,a variety of computationally based methods have therefore been developed in recent years.These computational methods can Efficient identification of protein succinylation sites can assist biological experimenters in experimental research.In this paper,based on the protein sequence,combined with the annotation characteristics of succinylation site data,the identification method of succinylation site was studied in depth.The main points are summarised below.1.According to the annotation background of succinylation site data,this paper constructs a method based on Positive-Unlable Learning(PU Learning)to identify succinylation sites.In the computational methods for predicting succinylation sites,the succinylation sites that have been annotated are usually regarded as positive samples,and the remaining lysine sites without any succinylation annotation are regarded as negative samples.In fact,some negative samples may be positive.This method will produce false negative samples,thus,the prediction accuracy will be affected.To solve this problem,the PU bagging method is used to establish a new succinylation site prediction method PUL＿Succ in this paper.The main steps of this method are: first,randomly select data from unlabeled samples and combine all positive samples for bagging training Classifier;then use the trained classifier to predict out-of-bag samples and record their scores;repeat the above steps to roughly classify each unlabeled sample.2.Aiming at the characteristics that protein sequences are sequential data in which amino acid letters are arranged in a certain order,this paper uses LSTM network and CNN network to construct a hybrid model Deep Succ to identify succinylation sites.First,combined with the previous feature coding evaluation work,this paper picked and chose five superior feature codes: one-hot,BLOSUM62,ACF,AAindex,and CKSAAP coding to characterize succinylation samples.Secondly,four network models LSTM-CNN,CNN-LSTM,LSTM,CNN are constructed using LSTM network and CNN,and then the selected five feature codes were respectively input into each of these four models for training to evaluation.Based on the performance of each model,the optimal model among them were chosen to construct a hybrid model Deep Succ that composed of five sub-modules for integrating heterogeneous information.The ten-fold cross-validation and independent test set results showed that Deep Succ has good robustness.

Keywords/Search Tags:

Succinylation modification, Positive-Unlabele learning, Deep learning, Long short-term memory network, Convolutional Neural Networks

PDF Full Text Request

Related items

1	Investigating Deep Neural Networks For Gravitational Wave Evaluation With Deep Learning Ligo Data
2	Research On Deep-Ocean Remote Sensing By Deep Learning
3	Reconstruction Of Central Arterial Pressure Signal Based On Long Short-term Memory Network
4	Research On Short Term Forecast Of Fog Based On Deep-Learning
5	Research On Flash Flood Forecasting Based On Long Short-Term Memory Networks
6	Research On Magnetotelluric Intelligent Inversion Algorithm Based On Deep Learning
7	Application Of Long Short-term Memory Network In Short-term Rainfall
8	Research And Application Of Landslide Susceptibility Prediction Based On Long Short-term Memory Deep Neural Network
9	Sea Ice Classification Based On Deep Learning With SAR Imagery
10	Research On The Identification Of HVDC Transmission Interference Based On Deep Learning Of Geomagnetic Observation Data