Font Size: a A A

Research On Feature Extraction Algorithm Of Functional Peptide Prediction Problem Based On BERT Pre-trained Model

Posted on:2022-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2480306758491954Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the advent of the post-genome era,the emergence of a large number of protein sequences has brought new opportunities and challenges to researchers,and it has become inevitable to introduce computer science into the field of bioinformatics.This novel approach to bioinformatics research brings endless possibilities for "data".Functional peptides are of great significance to the regulation of human life activities,and the prediction of different types of functional peptides has become one of the research hotspots in bioinformatics.Human leukocyte antigen(HLA)is a molecule that exists on the surface of most human cells and plays an important role in the immune system's resistance to foreign cell invasion and regulation of immune responses.T cell antigen receptors(TCR)can recognize HLA on the surface of cancer cells-peptide complexes and utilize toxic T lymphocytes to destroy these cancer cells,so achieving accurate prediction of HLA-I allele and tumor antigen peptide binding will facilitate the rapid development of cancer immunotherapy.In this paper,a functional peptide feature extraction algorithm based on BERT pre-trained language model is proposed.Aiming at the single feature problem caused by the fact that the feature construction algorithm of amino acid sequence relies on the traditional sequence scoring function in the current research on the binding prediction of HLA-I(HLA-I)molecules and antigenic peptides,in order to break through the limitation of using classical machine learning algorithms to construct amino acid sequence features In this study,the feature construction technology in the field of natural language processing was transferred to the prediction of functional peptides.Similar to the way text languages are defined by fixed alphabets,protein sequences are usually formed by 20 different common amino acid combinations,and there is some commonality between protein languages and natural languages from the perspective of composition and completeness of information.In this paper,the amino acid sequence of functional peptides is regarded as a sentence of natural language,and each amino acid is regarded as a letter of natural language.By exploiting the commonalities between protein language and natural language,the latent features of functional peptide sequences are extracted from multiple dimensions,so that Research ideas for the prediction of innovative functional peptides.In this paper,taking the task of predicting the binding of HLA-I alleles to tumor antigen peptides as an example,based on the different application scopes of the prediction models,a pan-specific model Prot HLAI and an allele-specific model HLAB are proposed.The feature extraction module of the pan-specific model Prot HLAI adopts the cascade network structure of the Prot Bert model combined with the Bi LSTM model and the attention mechanism.26 independent sub-data sets are used in the experiment to compare the performance of Prot HLAI and other eight prediction tools,the results show that this algorithm has the best performance on 16 sub-datasets,and is the most stable performance among all prediction tools.The allele-specific model HLAB adopts the cascade network structure of the Prot Bert model combined with the Bi LSTM model to construct a feature extraction module.The HLAB model covers a total of 360 different specific classification tasks,and the performance comparison with other eight prediction tools show that,the HLAB model can achieve the best prediction performance in 90% of classification tasks.The experimental results of Prot HLAI model and HLAB model prove that: 1.The BERT-based functional peptide feature extraction algorithm proposed in this paper can achieve the state-of-the-art performance in this problem domain in different prediction tasks.2.The cascade feature extraction model of BERT combined with other models can more effectively "process" the features extracted by the BERT model,so as to obtain a feature extraction algorithm that is more suitable for actual downstream tasks.
Keywords/Search Tags:HLA-I binding peptide prediction, natural language processing, BERT model, functional peptide prediction, feature selection, feature dimensionality reduction
PDF Full Text Request
Related items