People’s lives and health are always threatened by diseases,and the development of new drugs is the key to overcome medical problems such as cancer and chronic diseases.As the starting point of drug development,computer-aided drug screening technology uses computers to rapidly screen out target-specific candidate drug molecules from a large number of compounds,which can effectively shorten the R&D cycle and reduce R&D costs.With the rise of data mining technology,using machine learning technology to improve the efficiency and effectiveness of drug screening has become an important issue in current research.To this end,this study optimized the drug screening process by using machine learning technology.For the two key problems of active drug molecule discovery and active drug molecule medicinal property determination in the drug screening process,we combined the ensemble learning and multi-label classification algorithms in machine learning to construct a drug target activity prediction model and a drug medicinal property determination model,respectively,and mined the best-performing drug molecule structure information by intelligent optimization algorithms.Finally,this paper conducts related experiments based on the anti-breast cancer drug data collected in the Drug Bank database,which verifies the effectiveness and advancement of this method.Firstly,to further improve the accuracy of drug target activity prediction,this study proposed a prediction model based on ensemble machine learning.At first,recursive feature elimination(RFE)is used for feature selection,and SHAP method is used to calculate the importance of features,so as to improve the interpretability and prediction accuracy of the model input.Secondly,the best combination of primary-learners and meta-learner was selected for the Stacking ensemble model by the heterogeneity analysis and performance comparison of10 kinds of machine learning learners.Finally,an adaptive step size firefly algorithm is used to optimize the parameters to further improve the prediction ability of the model.The experimental results show that our method significantly outperforms other popular machine learning algorithms.Secondly,in order to make full use of the relationship between labels,this study proposes an improved classifier chain(CC)model to improve the ability to determine the druggability of drug molecules.First of all,through Pearson correlation analysis,a potential link between the drug properties was found.Then,aiming at the problem of low model performance and instability caused by the random chain sequence of the CC model,a label sequence optimization strategy is proposed.A label co-occurrence matrix is constructed based on the co-occurrence analysis,and determine the training order of labels in the CC model by quantifying the degree of label contribution with greedy thinking.Experiments show that this method outperforms other 11 popular multi-label classification algorithms,CC model and LOCC model based on it.Thirdly,based on the above two models,a multi-objective optimization-based optimal feature interval discovery model for drug molecules is proposed.The model takes the highest target activity value and the most satisfying drug properties as the optimization direction.The NSGA-II algorithm is used to solve the model,and the crossover operator is improved to improve the optimization ability of the algorithm.Through experiments,a total of 236 virtual drug molecules with better activity values and drug-likeness properties were found,and the optimal value ranges of important features were determined through numerical statistical analysis,which provided auxiliary support for the design of subsequent drug molecular structures.In summary,this paper uses machine learning technology to establish the prediction model of target activity and the determination model of drug-likeness properties,and establishes a multi-objective optimization model to mine the structural information of the optimal drug molecules.,which can jointly improve the performance of drug screening auxiliary methods from three aspects: preliminary screening,fine screening and mining of drug structural information,with some guidance and practical significance for reducing the cost and improving the efficiency of new drug research and development. |