| Near-infrared spectral data often contains a large amount of redundant information and a certain amount of noise,which can affect the predictive performance of near-infrared spectral prediction models.Therefore,before modeling,it is necessary to select characteristic wavelengths for near-infrared spectral data,reasonably reduce the number of variables,highlight the relationship between characteristic variables and response variables,and optimize the prediction model.Among numerous wavelengths selection methods,the correlation coefficient method can accurately calculate the correlation between each independent variable and the dependent variable,while retaining wavelengths with larger correlation coefficients for modeling.This method has the advantages of simple calculation,low complexity,and strong interpretability.However,the correlation coefficient method does not consider the interaction between wavelengths(variables),collinearity between wavelengths(variables)and other issues.To address these issues,this article further considers reducing the correlation between wavelengths and amplifying the correlation between independent and dependent variables to extract model variables based on the correlation coefficient method.A two-stage correlation coefficient wavelengths selection method(TSCC)based on one-dimensional correlation coefficient and a two-dimensional correlation coefficient wavelengths selection method of important variables(2DIV)based on two-dimensional correlation coefficient were proposed.The specific research content is as follows:(1)Propose a TSCC algorithm based on one-dimensional correlation coefficient.Based on the one-dimensional correlation coefficient algorithm(OCC),the OCC algorithm is used again to select variables with smaller correlations with other variables for modeling.Apply the TSCC algorithm to two publicly available NIRS datasets(corn dataset and soil dataset),establish multiple linear regression models and partial least squares regression models,and compare the predictive performance with different wavelengths selection methods(successive projections algorithm,OCC algorithm).The results show that the TSCC algorithm eliminates the multicollinearity between wavelengths.At the same time,the high correlation between the dependent and independent variables is maintained,and the selected subset of variables is suitable for modeling with multiple linear regression models.Moreover,the predictive ability of the model is superior to methods such as OCC algorithm and successive projections algorithm,making it an effective wavelengths selection algorithm.(2)Propose a 2DIV algorithm based on two-dimensional correlation coefficients.In order to express the interaction relationship between more characteristic wavelengths and enhance the correlation between spectral data and the content of the chemical substance to be tested,the twodimensional correlation coefficient is combined with the competitive adaptive reweighting sampling method and variable importance projection method to calculate the correlation coefficient between important variables for wavelength selection.Five preprocessing methods(first-order derivative,second-order derivative,multivariate scattering correction,standard normal variable transformation,Savizky Golay first-order derivative method)were used to process the original data,and the two-dimensional correlation coefficients corresponding to the five processed spectral data were calculated.The appropriate preprocessing method was selected based on the distribution of the two-dimensional correlation coefficient equipotential map and the size of the correlation coefficient.Divide the preprocessed data into a modeling set and a prediction set in a 3:1 ratio.Using the competitive adaptive reweighting sampling method to select variables with high correlation with the content of the chemical substance to be tested,and based on this,the variable importance projection method is used to select independent variables that contribute greatly to the prediction of chemical composition.Finally,a two-dimensional correlation coefficient between the spectral index and the content of the chemical substance to be tested is constructed for feature wavelength selection,reducing the number of variables in the prediction model and improving prediction accuracy.The results show that under the same spectral index,the prediction model constructed based on the feature wavelength selected by the2 DIV algorithm has better prediction performance than the prediction model constructed based on the wavelength selected by other wavelength selection methods.Among them,the 2DIV wavelength selection method based on the ratio index has the best effect in predicting lignin content in corn straw,with a model prediction set decision coefficient of 0.9310 and a prediction set root mean square error of 0.3597. |