Font Size: a A A

Design Of Phage Hydrolase System Based On Feature Fusion

Posted on:2021-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:H F LiFull Text:PDF
GTID:2480306197495834Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Bacteriophage is helpful to fighting bacterial pathogens,especially those that cannot be killed by antibiotics and chemicals.Therefore,accurate identification of phage and phage-encoded hydrolase is of great help to the medical cause of diseases caused by bacteria.Although biochemical-based methods can accurately identify phage hydrolase and clearly elucidate the functional mechanism of the enzyme,this method is time-consuming and expensive.In order to solve this problem,we use feature fusion technology and machine learning(ML)algorithm to construct a phage hydrolase prediction system.The main work of this paper is described as follows:(1)Extracting the digital features of the samples and using the analysis of variance algorithm to fuse the features.First,phage enzyme and phage hydrolase samples were transformed into four digital features,namely G-gap dipeptide(GGDP),pseudo-amino acid composition(Pse AAC),grouped tri-peptide composition(GTPC),and composition transition distribution information(CTD).Because the single features have a poor expression effect on the sample data,the four features are combined into three groups of combined vector expression samples.The first group: GGDP and Pse AAC;the second group: GTPC and CTD;the third group: GGDP and Pse AAC,GTPC,and CTD.The redundant information of the combined features is eliminated by the analysis of variance algorithm so that multiple features are integrated.It was found that the best fusion feature of phage enzyme samples are the third group of combined features vectors,and phage hydrolase were the first group.(2)Building the phage prediction model based on feature fusion.For a new sequenced phage protein,we firstly need to judge whether the phage protein is an enzyme.When a phage protein is predicted as a phage enzyme,it is necessary to immediately judge whether the enzyme is a hydrolase.The predictive model of phage is established by using SVM with Leave-One-Out cross-validation,whose input vector is the third set of fusion features.The prediction model's Matthews correlation coefficient,area under receiver operating characteristic(ROC)curve,and overall accuracy are 0.703,89.7%,and 85.1%,respectively.(3)Building the phage hydrolases prediction model based on feature fusion.Once a new sequenced phage protein is discriminated as a phage enzyme,the second step is to judge whether the phage enzyme is a hydrolase or not.Like phage enzyme prediction,phage hydrolase prediction were also examined by using SVM with Leave-One-Out cross-validation,differently,whose input vector is the third set of fusion features.Further investigation showed that 93% of phage hydrolases and 96% of other phage enzymes can be correctly identified,and the overall accuracy is 94.3%.(4)Developing the online prediction system for phage hydrolase.The prediction model of the phage enzyme and phage hydrolase experiments is integrated into an online prediction system(www.predic.top)that was built by the python language and the Flask framework.Users can quickly and highly predict phage hydrolases and non-phage hydrolases by entering samples in the FASTA format in the prediction module.In addition,the system's training module integrates four ML algorithms for the majority of enthusiasts to train models;Experimental data can be Download through the manual module.Users can send emails to inquire about experiments or system related problems through the contact module.
Keywords/Search Tags:Bacteriophage enzymes, Hydrolase, Analysis of variance, Sequence feature, Machine learning
PDF Full Text Request
Related items