Database Construction And Prediction Of G-Quadruplex Binding Proteins

Posted on:2023-06-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2530307061452044

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

G-quadruplex(G4)is a secondary structure of nucleic acid,which has a special combination of tetrad stacking.It plays an important role in biological processes,such as transcription and translation.The results of biological experiments and bioinformatics analysis showed that G-quadruplexes can recruit functional proteins,then act on specific biological processes.We call them G4-binding proteins.There is only one G4-binding protein database(G4IPDB)published in 2016,and due to the difficult detection technology and high cost,the number of identified G4-binding proteins is still small.Therefore,this paper focuses on human G4-binding proteins,builds a database for subsequent studies,analyzes the sequence features of G4-binding proteins,and builds predictive models based on the features.First,the database of human G4-binding proteins was built.Obtain datas through literature reading and database searches,then use Django,My SQL,and Bootstrap,build the database with functions such as browsing,searching,and downloading.The database contains 273 pieces of G4-binding protein data,each of which includes the protein and G-quadruplex information.Secondly,the sequence features of G4-binding proteins were analyzed,including amino acid composition,difference analysis,and motif prediction.The results showed that the amino acid composition of the G4-binding protein group was similar to the nucleic acid-binding protein group,but different from the human protein group.And from the motif predictions,found two typical motif patterns,one corresponds to the RGG domain,and another contains lysine(K),glutamine(E),and arginine(R).The above results indicate that the G4-binding protein sequence is specific and can be used as the feature of models.Finally,the predictive models of G4-binding protein were built.G4-binding proteins as the positive sample,excluding the human proteins of G4-binding proteins as the negative sample,and the sequences as the feature.The model building is based on support vector machines(SVM)and deep learning(CNN-Bi LSTM).The model was trained and validated at a ratio of 4:1,and got the SVM model(Accuracy: 0.6667;Precision: 0.7692;Recall: 0.6451;AUC: 0.6565)and CNN-Bi LSTM model(Accuracy:0.9315;Precision: 0.4286;Recall: 0.7391;AUC: 0.7650)with good prediction effect.These two types of models are suitable for G4-binding protein prediction of small sample and large sample data.Compared with the work that uses the RGG domain score to predict G4-binding proteins,our work uses machine-learning algorithm first to construct the prediction model of G4-binding protein,and uses the sequence information as the feature of prediction.This work is innovative and forward-looking.

Keywords/Search Tags:

G-quadruplex, G4-binding protein, database, sequence characterization, prediction model

PDF Full Text Request

Related items

1	Research On Intelligent Computing-based Methods For Protein-peptide Binding Prediction
2	Computational Approaches for Mutation Phenotype Prediction and Protein Binding Site Characterization
3	Research On Protein-protein Binding Sites Prediction Method Based On Sequence Information
4	Research On Protein-ligand Binding Sites Prediction Based On Sequence Information
5	The Study On DNA-binding Protein Prediction Based On Sequence Information
6	Protein-RNA Binding Prediction Based On Bi-LSTM And DenseNet
7	Research On Protein Metal And Radical Ion-Binding Sites Prediction By Sequence Information
8	Biophysical studies on human telomeric g-quadruplex DNA: Characterization of ligand and protein interactions
9	Protein binding microarrays for the comprehensive characterization of transcription factor binding specificities
10	Analysis And Prediction Of Rna-binding Residues In Protein Molecules