| In the development of each drug,it is necessary to test whether the contained compounds have toxic effects.Among them,cytotoxicity is caused by the potential harm to human body,which leads to the withdrawal of many drugs after the late development and even after the market.It causes huge economic losses and irreparable harm to patients.Therefore,the detection of cytotoxicity has gradually become an important part of the drug development process.With the increasing drawbacks of traditional methods and the rapid development of computer technology,the use of computer simulation methods to predict cytotoxicity has attracted more and more toxicology researchers.However,there has not been a satisfactory research progress in this field,because the cytotoxicity data set is highthroughput and has a class imbalance problem.In this study,several strategies of performing an SAR study for a cytotoxic endpoint in AID364 dataset were explored to solve the class-imbalance problem.Random forest AdaBoost was used as the base learners for ten types of molecular fingerprints and an ensemble method and six databalancing methods were applied to balance the classes.As a result,the ensemble model using MACCS fingerprint was found to be the best,giving area under the curve(AUC)of 85.2±0.35%,sensitivity of 81.8±0.65%,and specificity of 76.0±0.12% in five-fold cross-validation and AUC of 78.8%,sensitivity of 55.5%,and specificity of 78.5% in external validation.Good performance also appeared on other datasets with different sizes / degrees of imbalance.The system uses the ensemble model of MACCS molecular fingerprint as the core algorithm.The whole system is divided into registration login module,data preparation module,classification prediction module,result analysis module and system management module according to functional and non-functional requirements.The Bootstrap and Sweetalert frameworks are adapted to make the front-end page concise and beautiful,and the R language and PHP language are used to make the back-end functions smooth and powerful.The entire system is deployed on an Apache server equipped with a Linux operating system,and has been thoroughly tested to ensure long-term operation of the system. |