Font Size: a A A

Deep Learning-based Cell-penetrating Peptide Prediction Methodological Research

Posted on:2024-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2544306917497064Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Peptides are essential components of biological cells and play an important role in many biological functions.In recent years,peptides have played an important role in drug research because of their unique advantages as drug molecules in terms of selectivity,affinity and safety,as well as their specific targeting and rich biological activity.However,in the development of peptide-based drugs,there are difficulties in transporting drug molecules efficiently into the cell membrane without damaging the original cell.Cell-penetrating peptides(CPPs)have received considerable attention because of their ability to transport pharmacologically active molecules into living cells without disrupting the cell membrane.Cell-penetrating peptides are small peptides consisting of no more than 30 amino acids that can be involved in tumour therapy,inflammation targeting and cytostatic fungicide,while their low toxicity profile meets the needs of drug discovery.Therefore,how to accurately identify cell-penetrating peptides has become a key issue in computerised peptide drug screening.Recently,researchers have proposed some machine learning methods to predict cell-penetrating peptides while achieving very good results,but the following problems still exist:(1)Traditional manual feature extraction is tedious.Existing methods commonly use feature learning methods and rely heavily on manual feature production.(2)Inadequate extraction of sequence features.Most existing methods ignore the consistency between similar cell-penetrating peptides and the differences between dissimilar cell-penetrating peptides.(3)Incomplete feature representation.The existing methods generally extract features from a single perspective,ignoring the diversity and comprehensiveness of the feature representation.The thesis addresses the current problems and proposes three deep learning-based prediction methods to effectively extract sequence-based CPPs features from different perspectives,optimise prediction results and improve CPPs prediction performance.The main contributions of this thesis are as follows:1)A sequence and attention mechanism-based cell-penetrating peptide prediction method is proposed to solve the problem of cumbersome feature extraction,which is called CPPFormer.based on the "encoder-decoder" paradigm,the Transformer model has become a mainstream in the field of natural language processing with its superior performance.It has had a huge impact in the field of deep learning.Bioinformatics,on the other hand,has embraced machine learning and has made tremendous progress in drug design and protein property prediction.Cell-penetrating peptides(CPPs)are permeable proteins that come in handy as a ’postman’ in drug penetration tasks.However,only a small number of CPPs have been investigated for discovery,let alone for practical applications in drug penetration.The correct identification of CPPs therefore opens up a new pathway for macromolecular drugs to enter cells free of other potentially harmful agents.Most previous work has used only trivial machine learning techniques and hand-crafted features to construct a simple classifier.In CPPFormer,we learn from the idea of implementing the attention structure of the Transformer,reconstruct the network based on the short length of the CPP according to its characteristics,and use an automatic feature extractor together with some handdesigned features to guide the prediction results.Compared with all previous approaches and other classical text classification models,the empirical results demonstrate that our proposed deep model-based approach achieves the best performance of 92.16%on the CPP924 dataset and passes various metrics tests.2)A contrast learning and pre-training based cell-penetrating peptide predictor,SiameseCPP,a novel deep learning framework for automatic prediction of CPPs with features extracted directly from primary sequences is proposed.siameseCPP is based on a pre-trained model and a twin neural network consisting of a Transformer and a gated recurrent unit(GRU)A representation of CPPs was constructed.Contrast learning was used for the first time to construct a predictive model for CPPs.Comprehensive experiments demonstrate that our proposed SiameseCPP outperforms existing baseline models in predicting CPPs.In addition,SiameseCPP also achieves good performance in other functional peptide datasets,showing satisfactory generalisation ability.(3)A cell-penetrating peptide prediction method,AtomCPP,based on sequence information and atomic structure information is proposed.it exploits both atomic map features and sequence information in amino acids.Unlike existing methods,the model can make full use of sequence structure,potential,atomic topology and functional group information.Experimental results show that AtomCPP achieves an accuracy of 93.41%,while not requiring a time-costly pre-training process.We also performed ablation experiments to analyse the effect of structural features and sequence information on predictions and demonstrated the feasibility of AtomCPP with other datasets.
Keywords/Search Tags:cell-penetrating peptide, Transformer, contrastive learning, pre-training, atomic structure
PDF Full Text Request
Related items