Font Size: a A A

Classification Of Protein And Prediction Of Fluorine-containing Pesticides' Bioactivities Based On SVM

Posted on:2008-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:X S TanFull Text:PDF
GTID:2120360218454589Subject:Crop Science
Abstract/Summary:PDF Full Text Request
The experimental determination of protein structure and bioactivity of the unknownfluorine-containing compounds is thought to be very time-consuming, laborious, and costly. Theformer is difficult in the experiment and the latter have a negative impact on environment. Thusthe scientists have being sought after classifying the protein structures by the theoretical orcomputational methods, and predicting the bioactivity of the unknown fluorine-containingcompounds by quantitative structure-activity relationship (QSAR). Support vector machine(SVM), coupled with combinatorial prediction, was introduced to the quaternary structureclassification of protein and the bioactivity prediction of fluorine-containing pesticidesystemically, which will help us to investigate the protein structure, function, andfluorine-containing compounds design and synthesis.To further improve the classification precision in the protein quaternary structure, we adoptedfour feature extraction methods to research on classification using support vector classification(SVC). Amino acid composition method, dipeptide composition method, and amino acidcomposition distribution method were improved. Besides, pseudo amino acid compositionmethod was based on new structural descriptors of amino acid. The results showed the precisionof four models in classification all improved 2~3%. Then, combinatorial forecast was furtherintroduced, and the classification precision improved 2~3% more. As result, the precision ofindependent samples test exceeded 90%.To further understand QSAR of fluorine-containing pesticide and construct significantstructure-bioactivity models, we developed a novel forecast approach based on support vectorregression (SVR), which could seek the best kernel automatically, screen descriptors nonlinearlyand perform combinatorial forecast by constructing sub-models using K-nearest neighbor (KNN)model. Using the model, the bioactivities of 33 fluorine-containing pesticidal compounds actingto five different diseases were forecast by leave-one-out. The results indicated that screening thedescriptors and sub-models were essential, and combinatorial prediction after screeningsub-models could get better precision than the single model, MSE between 0.005 and 0.015, MAPE between 2.136 and 3.164. The result of leave-one-out test also indicated that SVR-CKNNhad the highest prediction precision and stability in all reference models built from the samedataset. It would have a wide application in forecasting of QSAR and related fields.
Keywords/Search Tags:support vector classification, support vector regression, combinatorial prediction, protein quaternary structure, quantitative structure-activity relationship
PDF Full Text Request
Related items