With the abuse of antibiotics,the problem of bacterial resistance has become increasingly serious,and the search for alternative drugs is urgent.Antimicrobial peptides,which have low resistance and naturally exist in animals and plants,may be a feasible solution.Early research using traditional experimental methods to identify antimicrobial peptide efficiency was inefficient and costly.Therefore,accurate prediction of the various activities of antimicrobial peptides is of great significance for improving activity identification efficiency and developing new drugs.With the development of deep learning technology,significant progress has been made in predicting antimicrobial peptides in existing research,but there is still a need for more effective methods to predict their different activities.In light of this,this thesis applies protein language models and graph neural networks to multi-label antimicrobial peptide prediction tasks,thereby improving prediction performance.Specifically,the main work of this thesis is as follows:(1)Considering the multi-activity of antimicrobial peptides and the inadequacy of sequence representation based on physicochemical properties,the prediction of antimicrobial peptides is modeled as a multi-label learning task,and a multi-label antimicrobial peptide prediction method based on protein language models is proposed.Firstly,the antimicrobial peptide sequence data is tokenized and sent to the protein language model through the input module to obtain the global feature embedding of the sequence data.Then,Text CNN is used with different convolution sizes to capture local information of the sequence features.The weight for each label is added to solve the problem of label imbalance,and the focus loss function is introduced to strengthen learning for labels with fewer samples.Finally,the effectiveness of this method is demonstrated through experiments.(2)In view of the current inadequacy of using only sequence features in antimicrobial peptide prediction,a multi-label antimicrobial peptide prediction method combining sequence and structural information is proposed.Corresponding antimicrobial peptide structural data sets are obtained from the protein structure database PDB through sequence alignment,and scalar and vector features in the structure are extracted based on protein structure characteristics.The sequence features from the protein language model are then concatenated with these structural features and input into the graph neural network for training.The effectiveness of adding structural information in multi-label antimicrobial peptide prediction tasks is verified through experiments.(3)A multi-label antimicrobial peptide prediction platform is designed and implemented based on the Spring Boot and Flask frameworks.A multi-label antimicrobial peptide prediction model based on protein language models is trained using all sequence data under the pre-set hyperparameters for providing prediction services on the platform.The prediction service is deployed as an interface using Flask,and the website backend is built using Spring Boot,which implements system calls through Http Servlet.This platform realizes fast prediction of different activities of antimicrobial peptides.The multi-label antimicrobial peptide prediction method based on protein language models and the multi-label antimicrobial peptide prediction method that combines sequence and structural information proposed in this thesis aim to rapidly predict the biological activity of a given amino acid sequence to improve the screening efficiency of candidate antimicrobial peptides and further promote the development of antimicrobial peptides. |