The spread of antibiotic resistance has become one of the most urgent threats to global health.For safety regulatory purposes,the identification of antibiotic resistance genes in bacteria has great significance.With the rapid development of whole-genome sequencing technology,researchers have proposed various machine learning and deep learning methods for the prediction of antibiotic resistance genes.Although these works have made some progress,there are still the following challenges:(1)The prediction of multidrug resistance genes is too partial,and it cannot provide a more comprehensive reference for clinical treatment;(2)Protein structure determines its function,but most structural data of the existing resistance genes are not available,so most existing prediction methods only use sequence data for feature representation;(3)The existing methods that can simultaneously predict the class of antibiotics,resistant mechanism and gene transfer mode do not solve the "see-saw" problem in multi-task learning,so there is still some room for improvement.In view of this,this thesis improves the effectiveness of the prediction of antibiotic resistance genes based on deep neural network related technologies,using multilabel learning,dual-view modeling,and multi-task learning methods.Specifically,the main contributions of this thesis are as follows:(1)In response to the problem that existing methods cannot effectively solve the prediction of multidrug resistance genes,this thesis models the task of predicting antibiotic resistance genes as a multi-label learning problem and designs a specific multi-label learning loss function based on partial order pairs to capture the correlation between different antibiotic classes to which multidrug resistance genes confer resistance.This allows us to fully utilize the advantages of multi-label learning and make the learned model more generalizable.In addition,to address the problem of insufficient structure data,which leads most existing methods to only use sequence data for feature representation,this thesis uses Alpha Fold2 to predict missing structures in the dataset,and introduces a dual-view modeling mechanism to fully utilize the semantic correlation between the features based on sequence and those based on structure,thus obtaining more meaningful feature representations.A large number of experiments were conducted on the dataset constructed in this thesis,and the effectiveness of the proposed method was verified.(2)In response to the "see-saw " phenomenon present in existing multi-task learning methods for predicting the class of antibiotics,resistant mechanism,and the gene transfer mode,this thesis proposes a multi-task learning framework for antibiotic resistance gene prediction based on the gate-control mechanism.Specifically,shared experts and specific task experts are explicitly separated to learn private features and shared features,respectively.Dynamic fusion of private features and shared features is performed through a gate-control network to extract the most effective feature representations,thereby improving the generalization ability and prediction performance of the model.Experimental results on the currently most comprehensive antibiotic resistance gene dataset HMD-ARG-DB demonstrate the effectiveness and generalization ability of the proposed method. |