Research On The Prediction Of Drug Targets Based On Imbalance Data Mining

Posted on:2018-09-30

Degree:Master

Type:Thesis

Country:China

Candidate:L G Cai

Full Text:PDF

GTID:2334330512973315

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The discovery and location of drug targets are the keys to the success of drug research.Enter the post-genome era,with the rapid development of chemical genomics and pharmacological techniques,a large number of potential targets and massive biological activity data have emerged.However,in the study of drug targets,so far,the number of clinically proven drug targets is still very small,so far only about 500 drug targets.The reason for this is partly because,with the accumulation of redundant data,the need for high-throughput,large-scale data analysis can not be met by simple analytical methods.However,due to the limitations of flux,accuracy and cost,of the application is difficult to carry out extensive.As a kind of fast,low-cost method,the prediction of drug targets based on data mining methods is receiving more and more attention.Based on this background,this paper discusses the drug target prediction based on imbalance data mining to accelerate the discovery process of drug targets and save the cost.Predicting drug targets from a large number of proteins is a typical data imbalance problem,and the accuracy rate will decrease in different degrees when using classifier to predict,therefore,in the data level,a synthetic minority oversampling technique is used to preprocess the data,which is based on the genetic algorithm.It can improve the minority sample and Balance the amount ratio of drug targets and the drug targets.Then,a ensemble learning SVM classifier is used to predict drug targets.Compared with the single SVM classifier,the method can improve the generalization performance of prediction model.In order to demonstrate the effectiveness of the proposed method,this paper firstly builds two groups of data set.one data set is composed of all human protein data and the other is composed of human G protein coupled receptors datasets which hold the high proportion in drug targets.we extracted the primary sequencecharacteristic,polypeptide characteristic and physical and chemical properties characteristic of the protein as the feature space of the training classifier.The burden of learning the classifier is reduced by the feature selection.Then the optimal classifier is constructed by adjusting the model parameters.The SVM classifier and the Adaboost-SVM classifier are used to classify the data sets in the experimental building and analysis section respectively.Two kinds of classifiers get four kinds of experimental results before and after data preprocessing,and the experimental results verify the effectiveness of the proposed method.The results show that the proposed method can effectively predict drug targets,and it provide pre-reference for drug research and development workers.

Keywords/Search Tags:

drug target, data mining, support vector machine, ensemble learning, synthetic minority over-sampling technique

PDF Full Text Request

Related items

1	Application Of Improved Support Vector Machine To The Diagnosis Of Benign And Malignant Breast Tumors
2	Research On Data Mining Of Blood Glucose Spectrum Based On Machine Learning
3	Research On FMRI Data Classification Based On Independent Component Analysis And Ensemble Learning
4	Microcalcification Clusters Detection Based On Subspace Learning And Support Vector Machine
5	Research On Intelligent Diagnosis And Decision Support Of Pregnancy-induced Hypertension Based On Unbalanced Dat
6	Mining And Analysis Of Drug Sampling Data In 11 Provinces Including Beijing, Zhejiang, Shanghai, Guangdong, Etc. From 2014 To The First Half Of 201
7	Prediction Of Drug-target Interactions Based On Multiinformation Fusion And Machine Learning
8	Ensemble And Machine Learning-based Chemometrics For Metabolomics Data Analysis Associated With Inborn Errors Of Metabolism
9	Research On Training Method Of Support Vector Machine And Its Application In Disease Diagnosis
10	Breast Cancer Analysis And Predictive Diagnosis Based On Data Mining