Font Size: a A A

Machine Learning Approaches For Drug-Target Interaction Prediction

Posted on:2021-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S. M. Hasan MahmudFull Text:PDF
GTID:1364330647960888Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Accurate identification of drug-target interaction(DTI)is a crucial and challenging task in the drug discovery process,having enormous benefit to the patients and pharmaceutical company.The traditional wet-lab experiments of DTI is expensive,time-consuming,and labor-intensive;hence,it is significantly imperative to establish computational methods to predict potential DTI in a timely manner.Fortunately,computational approaches can identify new interactions(drug-target pairs)and accelerate the process of drug repurposing.In this thesis,we investigate techniques that identify new interactions based on prior knowledge of existing drugs and their experimentally confirmed targets.Moreover,we identified and addressed major problems in DTIs prediction.Having addressed these problems,we were able to boost the prediction performance and outperform related methods.To-date,multiple computational techniques have been presented to simplify the drug discovery process,but a huge number of interaction is still undiscovered.Furthermore,class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet.Nowadays,the number of drug-target features and their interactions are also increasing,disabling the prediction and analyzing ability of traditional computational methods.Furthermore,accurate interactions also depend on the negative drug-target pairs,and it is worthwhile to build a technique to generate effective negative pairs.Firstly,DTIs prediction methods have difficulty discovering interactions involving targets or drugs for which there are no effective feature representation of drug-target pairs.To predict interactions,a novel high-throughput computational model for identification of DTIs based on drug chemical structures and protein sequences.More specifically,the protein sequence is extracted through position specific scoring matrix(PSSM)-Bigram,amphiphilic pseudo amino acid composition(AM-Pse AAC)and dipeptide Pse AAC descriptors which represents evolutionary and sequence information.The drug chemical structure is represented as molecular substructure fingerprint(MSF)which describes the existence of the functional fragments or groups.In addition,we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs.The experimental analysis exhibits that our model outperforms the similar methods in terms of area under ROC(au ROC)curve.Secondly,there are DTI datasets where the feature sets for representing the drugs and targets are of a high dimensionality.High dimensionality of the data may lead to much longer running times for the prediction task and lead to degradation in prediction performance.Developing a new robust model to derive the reduced features for effective prediction is of significant importance.Therefore,we propose a new multilabel algorithm by introducing Multi-kernel Learning(MKL)based SVM for DTIs prediction with various dimensionality reduction techniques.To calculate and select the top-ranked drugs and targets,we developed a Cluster-Based Molecular Similarity(Clu MS)algorithm.Clu MS starts with given drugs or target features.Then,three dimensionality reduction techniques have been applied to the extracted drug-target features.Finally,we trained a multi-kernel-based learner together with the reduced features and combined their prediction scores to show the final results.Thirdly,class imbalance is an issue that is prevalent across all DTI datasets.Therefore,our proposed method uses cluster under sampling(CUS)techniques to manage the data balancing and develops a novel feature eliminator Ens RFS to extract the best optimal features from drug-protein datasets,increasing prediction efficiency.More specifically,each drug molecule is transformed as the substructure fingerprint,in which certain functional fragments of chemical structure information is retained.For a protein sequence,different descriptors are utilized to represent its evolutionary information,sequence information,and structural information.Finally,the experimental results introduce new drug-target interaction samples based on prediction probability scores,which can motivate the researchers for further drug development.Finally,there is a concept called differential representation bias that has an impact on the prediction performance of DTI prediction methods.Specifically,differential representation bias refers to how much a drug(or target)appears in the positive training data as opposed to the negative data.To address those problems,we developed a technique MMIB to handle the majority and minority instances in the dataset and also utilized a LASSO model to convert the features into low dimensional space.Besides,we trained convolutional neural network algorithm with balanced and reduced features for accurate prediction of DTIs.
Keywords/Search Tags:Drug-target interactions (DTIs), Drug discovery, Boosting classifier, Feature extraction, High dimensionality, Data imbalance
PDF Full Text Request
Related items