Font Size: a A A

A Kind Of Predictability Index Based On Segment Decision Coefficient And Its Application

Posted on:2021-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2491306518487894Subject:Master of Agriculture
Abstract/Summary:PDF Full Text Request
The detection of the toxicity of environmental pollutants is of great significance for environmental governance.The conventional experimental detection method is time-consuming and labor-intensive,and leakage of pollutants may cause secondary pollution.Quantitative structure-activity relationship(QSAR)is an effective supplement to experimental detection.The structural characteristics of a compound can be used to predict its toxicity.How to select suitable characteristics for prediction is a key point of QSAR research.Measuring the correlation between two variables accurately is the basis of feature selection.The Pearson correlation coefficient R can only capture linear correlations,and the maximum information coefficient(MIC)can capture non-linear correlations,but it has the disadvantage of not being able to reflect predictability.As a functional relationship:(Y-0.5)~2=X,(Y∈[0,1]),although the correlation between X and Y reaches a maximum of1(MIC(X,Y)=MIC(Y,X)=1),a support vector regression(SVR)model based on X cannot accurately predict Y,and its predictability is about 0.Based on Fisher’s optimal segmentation strategy and the principle of structural risk minimization in statistical learning theory,this paper proposes an Adjusted Maximum Predictability Coefficient(AMPC).With good equivalence,under the same intensity noise level of different functions,the scores should be close.AMPC has obtained nearly perfect equivalence for 10 different functions.On the five different functions,the statistical power of AMPC is significantly better than R~2,d Cor,MIC and Chi MIC as a whole.The AMPC index was used as the redundancy and association measure index in the minimum correlation maximum redundancy(m RMR)feature selection method,and a new method AMPC_share was developed to select the optimal feature subset.Using the Housing data set to compare with nine feature subset selection methods such as PLSR,STEP,and KNN-FABC,AMPC_share obtained the best results using the fewest features.The analysis results of the three alcohol phenolic compounds QSAR data sets show that the SVR model based on the AMPC_share selection feature makes independent predictions,and its R2 is0.949,0.936,and 0.983,which are better than those reported in the literature and use fewer features.Further verify the predictive ability of AMPC as a regression model.Reference to SVR model using linear kernel and radial basis kernel respectively for independent prediction of555 causality data sets.The results show that the prediction accuracy of AMPC is better than the linear kernel SVR on 536 data sets,and better than the radial basis kernel SVR model on535 data sets,and the calculation time is significantly lower than the SVR model.
Keywords/Search Tags:Adjust maximal predictability coefficient(AMPC), predictability, nonlinear correlation, feature selection
PDF Full Text Request
Related items