| Gene expression programming algorithm(GEP)is a novel evolutionary algorithm resulting from the combination of genetic algorithm and genetic programming.The algorithm is very flexible and simple in coding and decoding,with high expressive power and easy genetic manipulation,and high efficiency in solving complex problems,which has been widely accepted in various fields.Compared with traditional mathematical statistical methods,GEP only needs to select appropriate fitness functions to evaluate chromosomes and does not require a deep mathematical foundation to accurately describe the relationship between complex data.This is one of the main reasons for its wide application.Significant results have been achieved in the aspects of GEP theory,but these results are based on classical GEP algorithms proposed for other GEP variants such as fuzzy controlled multicellular gene expression(FMCGEP)to support these theories,which need to be further investigated.Therefore,the innovative work in this paper analyzes the convergence of FMCGEP from a theoretical perspective and obtains the dependence of the convergence speed on the algorithm parameters,which enriches the theoretical research results of GEP.It is well known that drug development is a process that takes a long time as well as money.Due to the unknown nature of compounds,pharmaceutical researchers need to repeat hundreds or thousands of experiments to get relatively accurate results.The advent of artificial intelligence algorithms has provided a great convenience for pharmaceutical workers.Numerous researchers have applied machine learning algorithms in artificial intelligence to the field of drug discovery and development,and have achieved fruitful results.For the problem of physicochemical properties of compounds involved in the drug development process,this paper uses the FMCGEP algorithm to integrate the classification and regression application of compound toxicity data and compound activity data.The main work of this paper is as follows.(1)In this paper,we first give a formal definition of the concepts related to the FMCGEP algorithm and analyze the properties of each genetic operator of FMCGEP.Then the global convergence of FMCGEP is studied based on the analysis of these properties,and the convergence speed of FMCGEP is investigated by the spectral correction radius and the analysis of the fuzzy rules of the FMCGEP algorithm,and finally,the dependence of the algorithm convergence speed on the algorithm parameters and the addition of the fuzzy control mechanism to accelerate the convergence of the algorithm are further confirmed by examples and experiments.(2)Using FMCGEP as a secondary learner,Random Forest Classifier,Extra Trees Classifier,Gradient Boosting Classifier,Support Vector Machine,etc.and machine Learning algorithms as primary learners,a fuzzy adaptive multicellular gene expression programming integrated classification algorithm(FCMGEP-EC)is proposed based on the integration idea of stacking algorithm.The method combines the base learners and models and predicts the integrated classification for the Tox21 toxicity data set.By comparing with the base learner experimentally,the results show that the algorithm obtains better results in classification accuracy.(3)Using FMCGEP as a secondary learner,Linear Regression,Decision Tree,Elastic Network,Random Forest Regressor,Extra Trees Regressor,Gradient Boosting Regressor and machine learning algorithms as primary learners,a fuzzy adaptive multicellular gene expression programming integrated regression algorithm(FCMGEP-ER)is proposed based on the integration idea of stacking algorithm.Modeling and prediction were performed using FCMGEP-ER for compound activity data,and EVS,MAE,MSE,and R2_Score were used as evaluation metrics to compare the performance of the algorithm.The results of the study compared with the baseline machine learning method showed the effectiveness of this method.(4)The n-octanol/water partition coefficient(log P for short)reflects the lipid and water solubility of a substance.log P is related to the dissolution,absorption,distribution and transport of drugs in the human body,so accurate and effective prediction of log P is of great significance for drug development and human health monitoring.In this study,an improved Morgan fingerprint fuzzy control-based gene expression programming algorithm(FCMGEP2log P)was proposed to improve log P prediction accuracy.Experimental results based on RMSE and MAE show that the method not only outperforms multicellular gene expression programming,but also outperforms BP neural network,support vector regression,random forest regression and Gaussian process regression methods. |