Font Size: a A A

Improved Support Vector Regression And Its Application In Plant Protection

Posted on:2009-10-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Q TanFull Text:PDF
GTID:1103360272995415Subject:Plant pathology
Abstract/Summary:PDF Full Text Request
Regression modeling has been researched in plant protection,and most on non-linear models.Traditional methods of linear regression,such as Multiple Linear Regression(MLR) and Stepwise Linear Regression(SLR) are limited.Non-linear methods based on Empirical Risk Minimization(ERM),such as Artificial Neural Network(ANN),are good at nonlinear approximation,but barely overcome high-dimension and local minimum point,and tend to be serious overfitting under the situation of small sample,with great risk of misstatement in predicting.Statistical Learning Theory(SLT) has been developed with the research on small sample statistics estimation,and its great contribution was on Structural Risk Minimization(SRM) principle,based on which Support Vector Machine(SVM) learning method was put forward.SVM provides with high efficiency and powerful algorithms,capable of dealing with issues under circumstances of high dimensional,non-linear,and small sample.It can be classified as(Support Vector Classification,SVC) and regression problems(Support Vector Regression,SVR),and has the advantage of global optimization and strong generalization ability.SVM has been used in many fields,but few reports of SVM are in plant protection.A lot of research on application of SVR in plant protection has been made in this paper.Several issues of SVR,such as kernel selection without rule,high-dimension reduction,less-decipherment and low confidence probability of model,have been improved,with new algorithms proposed.Based on these improvements,two methods of SVR modeling,longitudinal data regression(exemplified by multidimensional time series analysis) and non-longitudinal data regression(exemplified by modeling of structure-activity relationship in pesticide quantitative and optimization of feed formulation) in plant protection were analyzed systematically and deeply in this paper. Main conclusions are as following:(1) Defects of SVR were improved.Kernel of SVR selection lacks theoretical basis,depending only on experience.The author developed the method of selecting optimal kernel automatically from four common kernels,abiding by MSE minimum principle.It is unreasonable to reduce dimension by selecting nonlinear descriptors using linear method such as step-by-step linear regression.Multi-round optimization was proposed in the paper,by which in a nonlinear way we can gradually eliminates descriptors that are unfavorable for increasing prediction precision from SVR models including all input descriptors,according to Leave One Out and MSE minimum principle,and the rest are remained descriptors.Lacking dominant expression,the result that supports SVR is unlikely to be explained.The author proposed Multi-round Compulsory Optimization in the basis of Multi-round Optimization.In this method,the sequence of descriptors' influence degree on prediction precision was given,so the model had a certain capacity for explaining.To reinforce reliability of SVR model under small sample situation,Secondary Leave-one-out was developed by the author. After normalizing optimal kernel and remained descriptors,the optimal SVR parameters are researched in Leave-one-out method,by which the specimen are trained and then prediction is made.Validation showed that Secondary Leave-one-out method is similar to independent testing.A basic technical frame was constructed for regressive analysis based on improved SVR.(2) Combinatorial prediction method based on SVR were developed for QSAR modeling.Precision of combinatorial model is higher than single model,so two kinds of combinatorial model were constructed for pesticide based on SVR.First,because most data are heterogeneous,kernel optimization and descriptors selection were carried out based on SVR,then combinational prediction was done by Secondary Leave-one-out method and KNN,optimization were took for samples and descriptors,so the precision is high.Second,because modeling becomes relatively hard when small sample is processed,another combinational model for small sample set QSAR study was constructed,and kernel optimization and descriptors selection were also carried out to make prediction.This model is assembled by local kernel(RBF-kemel) and global kernel(poly-kemel),its precision is obviously higher than linear method.These two methods were employed for different pesticide QSAR modeling and the results are better than documents'.(3) The author optimized complex culture media with multi-parameters and multi-levels based on SVR.It's meaning to optimize formulation and analysis effects of factor by few experiments.By taking optimization for culture media of diamondback moth(DBM) as example,theoretical research of SVR model in media optimization was carried out.Based on initial composition,Secondary Leave-one-out was carried out after kernel optimization and descriptors screening,the precision is higher than linear regression model,it is showed that SVR is proper for media optimization.Frequency statistics based on all-combination ensure level of factor extrapolation will be taken or not.The former method of model evaluation need compare with other model by MSE value.This paper constructed a method to optimize culture media and analysis effect of factor based on F test.It is antinomy to analysis factor effects according to first and second order term index of quadratic polynomial,so this paper put forward a new method to explain and evaluate factor effect by F test based on partial regression sum of squares.Evaluation of single factor effect and interaction of double factors were proposed at the same time.Effectiveness of this method was evaluated by a real experiment.Uniform design and SVR assembled to optimize the culture medium that own 12 factors and 5 levels of streptomyces hygroscopicus var Jing-gangensis Yen,the OD560 of satisfactory composition is 2.22,obviously higher than the initial OD560(1.72),just with 6 factors. The model is reasonable to explain factor effect and is powerful to optimize culture media.(4) Multi-dimensional time sequence was analyzed by GS-SVR model,which was constructed on the basis of geo-statistics and SVR.The model needs to character circumstance factor effects and dynamics characteristics,and the length of dynamics characteristics is hard to ensure.The author analyzed structure of data by semi-variation functions of geo-statistics,and defined the expansion exponent number of time sequence,to avoid local optimization of exponent- expanding result.Effects of historical circumstance factor for variable prediction have been embedded in historical variable,so historical circumstance factor just to expand one year.Kernel optimization and nonlinear descriptors selection after exponent-expanding and the following principal component analysis(PCA) could reduce data redundancy.Finally independent prediction was carried out with SVR.Prediction models for diseased panicle rate of wheat scab and damage degree of the 2nd generation corn borer were constructed and the result showed that methods based on SVR have the advantages of high prediction precision and stability.
Keywords/Search Tags:support vector machine, regressive prediction, quantitative structure-activity relationship, culture medium optimization, time series analysis
PDF Full Text Request
Related items