Font Size: a A A

Research On The Prediction Of Drug Targets Based On Imbalance Data Mining

Posted on:2018-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:L G CaiFull Text:PDF
GTID:2334330512973315Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The discovery and location of drug targets are the keys to the success of drug research.Enter the post-genome era,with the rapid development of chemical genomics and pharmacological techniques,a large number of potential targets and massive biological activity data have emerged.However,in the study of drug targets,so far,the number of clinically proven drug targets is still very small,so far only about 500 drug targets.The reason for this is partly because,with the accumulation of redundant data,the need for high-throughput,large-scale data analysis can not be met by simple analytical methods.However,due to the limitations of flux,accuracy and cost,of the application is difficult to carry out extensive.As a kind of fast,low-cost method,the prediction of drug targets based on data mining methods is receiving more and more attention.Based on this background,this paper discusses the drug target prediction based on imbalance data mining to accelerate the discovery process of drug targets and save the cost.Predicting drug targets from a large number of proteins is a typical data imbalance problem,and the accuracy rate will decrease in different degrees when using classifier to predict,therefore,in the data level,a synthetic minority oversampling technique is used to preprocess the data,which is based on the genetic algorithm.It can improve the minority sample and Balance the amount ratio of drug targets and the drug targets.Then,a ensemble learning SVM classifier is used to predict drug targets.Compared with the single SVM classifier,the method can improve the generalization performance of prediction model.In order to demonstrate the effectiveness of the proposed method,this paper firstly builds two groups of data set.one data set is composed of all human protein data and the other is composed of human G protein coupled receptors datasets which hold the high proportion in drug targets.we extracted the primary sequencecharacteristic,polypeptide characteristic and physical and chemical properties characteristic of the protein as the feature space of the training classifier.The burden of learning the classifier is reduced by the feature selection.Then the optimal classifier is constructed by adjusting the model parameters.The SVM classifier and the Adaboost-SVM classifier are used to classify the data sets in the experimental building and analysis section respectively.Two kinds of classifiers get four kinds of experimental results before and after data preprocessing,and the experimental results verify the effectiveness of the proposed method.The results show that the proposed method can effectively predict drug targets,and it provide pre-reference for drug research and development workers.
Keywords/Search Tags:drug target, data mining, support vector machine, ensemble learning, synthetic minority over-sampling technique
PDF Full Text Request
Related items