Complex diseases such as cancer are systemic diseases which are caused by abnormalities of multiple genes.Gene mutation,copy number variation and the imbalance of gene expression may become the causes or results of tumor.Traditional strategy of drug development — "one drug,one target" has been demonstrated to be deficient,such as susceptibility to drug resistance and side effects.Moreover,the robustness of biological system makes it difficult to completely inhibit the growth of tumor cells with a single drug.The treatment of tumor requires drugs which can affect multiple targets and signaling pathways to prevent the generation,transmission,and function of disease signals from different targets and locations.Drug combination is one of the most commonly used therapeutic strategies in multi-target therapy for its low toxicity and high efficacy,so it is of great importance to build a model or workflow for predicting synergistic drug combinations.Traditional strategy for screening synergistic drug combinations is experimental screens.However,experimental screens have some serious shortcomings such as high costs on developing new drug and low efficiency.By contrast,screening synergistic drug combinatins based on in sillico methods has some favorable benefits such as high through-put,higher efficiency,lower costs and less pollution.With the huge accumulation of omics data and rapaid development of bioinformatics technologies,there is a possibility that in silico methods will play a more important role than experimental screens in screening synergistic drug combinations.For researchers in the field of cancer research,it is meaningful to construct an effective model for predicting anti-cancer synergistic drug combinations.In the recent decade,predicting synergistic drug combinations in silico has been more and more popular.Many computational models have been proposed.According to the machine learning methods used in model construction,we classified these models as unsupervised models,semi-supervised models and supervised models.Random forest is a common machine learning algorithm used to build a supervsied model.To buid a model based on random forest algorithm,researchers should collect labeled samples as traning dataset,design reasonable features and score these features based on labeled samples.Features are the key of the prediction model.However,the most commonly used features were almost all based on drug phenotype information,such as drug target and chemical structure.These features are important drug properties,but they can only reflect drug mechanism of action to an extent.Pharmacogenomics data(i.e.gene expression profiles following drugs treatment)can reflect the perturbation effect of drugs on cell lines and further reflect drug pharmacological properties.Thus,we need to take both drug phenotype data and drug pharmacogenomics data into consideration when we design features.This study is based on the known synergistic drug combinations,additive drug combinations,antagonistic combinations in DREAM Challenge 7 sub-challenge 2,and the corresponding drug phenotype data and gene expression profile data.We designed and analyzed twenty one features which included the drug phenotype features and pharmacogenomics features.The optimal prediction model was obtained by using the supervised learning algorithm(Random Forest)for feature selection and model-building.Then we screened out anticancer drugs approved by the U.S.Food and Drug Administration(U.S.FDA)from the Connectivity Map.Phenotype data and the corresponding gene expression profile data of these anti-cancer drugs were used as the test dataset which was applied to the optimal prediction model to predict synergistic drug combinations and evaluate the predictive capacity of the optimal model.The Out-of-bag estimate error rate of the optimal prediction model we established was 0.15 in train dataset and the value of Area Under Curve was 0.89.Drugs that satisfy the above mentioned condition in the Connectivity Map constituted a total of 187 anticancer drug combinations.Among the 187 drug combinations,twenty-eight drug combinations were predicted to be potentially synergistic by the optimal prediction model(SyDRa).Through searching the literatures in public database,we found three drug combinations had been reported to be effective drug combinations for cancer treatment,namely,azacitidine and thalidomide,imatinib and paclitaxel,streptozocin and carmustine.We can see that SyDRa has the ability to distinguish synergistic drug combinations from non-synergistic drug combinations.In this study,we analyzed and screened the important features related to the prediction of synergistic drug combination,including the drug phenotype features and pharmacogenomics features.We constructed a model for the prediction of synergistic drug combinations based on the supervised learning algorithm,which could provide reference to the preliminary screening of large-scale synergistic drug combinations. |