Font Size: a A A

The Research Of Prediction For Bioactive Peptides Based On End-to-End Model

Posted on:2023-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2530307022997579Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The prediction for bioactive peptides is of great significance to the fields of medicine and food research.At present,the prediction for active peptides based on machine learning methods is mainly for specific active peptides,and biological characteristics are mainly generated through program prediction or artificial construction,and then modeling to predict active peptides which results depend on the accuracy of prediction for relevant feature and relevant biological background knowledge.Many active peptides are derived from precursor proteins,and the work of accurately locating peptide fragments from precursor proteins is relatively scarce,especially the prediction for active peptides based on an end-to-end model.In view of the research status,an end-to-end model of prediction for active peptides was proposed to realize the prediction for active peptides from precursor proteins.The main research results are as follows.A relatively complete precursor protein data set containing active peptide fragments has been constructed from the protein database named Uniprot.The data contains various types of active peptides which arrangements are more complex,including multi-fragment arrangement,continuous arrangement and overlapping arrangement.Based on a large-scale pre-trained protein language model instead of artificially constructing features,an end-to-end model of prediction for bioactive peptides was proposed which named Esm-Fcn.Aiming at the problem of predicting active peptides with continuous and overlapping arrangements,the model was optimized based on sequence annotation and decoding,and the Esm-CRF and Esm-span models were proposed.In addition,a multi-task learning architecture was used to train the task of prediction for active peptide sequences and the task of multi-label prediction for active peptides at the same time.The two tasks complement each other to enhance the single-task effect.The multi-task learning model can predict the active peptides and the corresponding categories from the precursor protein sequence at the same time with once inference.In order to avoid the contingency of data division,the 7-fold cross-validation method was used for model training.Esm-Fcn,Esm-CRF and Esm-span models were compared with each other,and the Esm-CRF model has achieved the best results.The problem of prediction for continuous active peptides is well solved by Esm-CRF model,and the Esm-span model solves the problem of prediction for overlapping active peptides to a certain extent.Through multi-task learning,the effects of multi-label prediction for active peptides and prediction for active peptide sequences are both improved by 1% compared with the single task.Compared with the other method for predicting active peptides from precursor proteins,Esm-Fcn,Esm-CRF and Esm-span models have all been significantly improved.
Keywords/Search Tags:Bioactive peptides, Active peptide prediction, Pretraining, Multi-task learning
PDF Full Text Request
Related items