| Drug toxicity is a prominent concern in the pharmaceutical industry,often leading to loss and increased costs during drug development.Among various types of drug toxicities,carcinogenicity is particularly significant and draws substantial public attention.The presence of carcinogenic compounds in drugs can have severe impacts on human health,necessitating cancer risk assessment of these compounds before market release.Traditional cancer risk assessment methods involve animal experiments,which are expensive,time-consuming,and riddled with limitations.With the continuous advancement of computational methods,research focus has shifted towards utilizing compound features and efficient computational models to predict their carcinogenicity.Several computational approaches have been proposed and employed for assessing the carcinogenicity of compounds.However,most methods consider only single types of features or lack expressive features,leading to limited predictive capabilities and ample room for improvement.In this thesis,we construct a deep learning model based on capsule network and attention mechanism named DCAMCP to discriminate between carcinogenic and non-carcinogenic compounds.Our dataset is derived from three different carcinogenic potency databases,and we train the DCAMCP on a dataset containing 1564 different compounds through their molecular fingerprints and molecular graph features.The trained model is validated by 5-fold cross-validation and external validation.DCAMCP achieves an average accuracy(ACC)of 0.718±0.009,sensitivity(SE)of0.721±0.006,specificity(SP)of 0.715±0.014,and area under the receiver-operating characteristic curve(AUC)of 0.793±0.012.Meanwhile,comparable results can be achieved on an external validation dataset containing 100 compounds,with an ACC of 0.750,SE of 0.778,SP of 0.727,and AUC of 0.811,which demonstrate the reliability of DCAMCP.The results indicate that our model has made progress in cancer risk assessment and could be used as an efficient tool in drug design. |