Font Size: a A A

Research On Protein S-nitrosylation Site Prediction Method Based On Deep Learning

Posted on:2022-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q MaFull Text:PDF
GTID:2480306614467364Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Protein post-translational modification refers to the covalent modification and general enzymatic process of amino acid residues outside the polypeptide chain by modifying or splicing groups.The catalysis,regulation and many other key functions of proteins in the life cycle are achieved by post-translational modifications.However,abnormal post-translational modifications can lead to mutations in the catalytic reaction ability of the protein,the four-level structure and the interactions between them.Protein S-nitrosylation is one of the most important and common post-translational modifications,formed by covalent modification of carbon monoxide and cysteine residues.Numerous studies have proved that protein Snitrosylation plays an important role in the immune response of plants,numerous physiological and pathological processes,and the treatment of various major diseases in humans.Therefore,how to correctly and efficiently predict S-nitrosylation sites has become a hot research topic in recent years.However,with the completion of the Human Genome Project and the development of high-throughput sequencing technology,traditional biochemical experimental methods are time-consuming and costly,and have long been unable to meet the needs of massive sequence analysis and identification.In order to solve the above problems,this study constructs a deep learning algorithm prediction model based on the sequence information of protein S-nitrosylation sites.First,it innovatively combines the under-sampling algorithm ENN and the over-sampling algorithm ADASYN algorithm to construct a balanced data set through resampling.Then,the recently popular deep learning algorithms BiLSTM and BERT are introduced to extract the sequence feature vectors of S-nitrosylation sites.For the mixed feature vector,in order to further improve the performance of the predictor,the MRMD algorithm is used to eliminate noise features and further optimize the algorithm model.In order to avoid the risk of over-fitting of mixed features,10-fold cross-validation was carried out for the single feature extraction method and the mixed feature extraction method respectively.Finally,an independent dataset is used to compare with previous research results.Using random forest as the classifier,the best prediction results were obtained with an accuracy of 0.911 and 0.796 on the cross-validation and independent test sets,respectively.In order to facilitate further research by relevant researchers,this paper integrates relevant research contents and proposes an online bioinformatics prediction tool for protein S-nitrosylation sites named Mul-SNO.The tool is freely accessible through the website http://lab.malab.cn/-mjq/Mul-SNO.
Keywords/Search Tags:S-nitrosylation, Deep learning, BERT, BiLSTM, Random forest
PDF Full Text Request
Related items