| With the rapid development of modern spectroscopy detection technology,near-infrared,mid-infrared and Raman spectroscopy detection technologies have been widely used in industries,agriculture,food,medicine and other fields,However,the high-dimensional infrared /Raman spectroscopy data collected in actual production often contains disturbance noise,it is difficult to obtain ideal prediction results by using conventional modeling strategy.Therefore,it is necessary to design a new modeling strategy for the analysis and mining of high-dimensional spectral data.At present,the classical algorithms commonly used in spectral modeling and analysis mainly include principal component regression and partial least squares(PLS)algorithm,etc.although the univocal quantitative analysis model constructed by PLS is widely used in spectral data mining and analysis,it is often prone to over fitting or complex modeling in the face of high-dimensional spectral data,which is not conducive to Mining and analysis of high-value feature variable information hidden in high-dimensional spectral data.This thesis adopts the modeling idea of consensus fusion strategy on the basis of feature variable selection algorithm and unsupervised clustering algorithm(self-organizing mapping algorithm),and constructs consensus fusion model based on residual information,the continuous CARS-PLS consensus fusion model,the multivariate consensus fusion model and the consensus fusion model of the SOM unsupervised clustering algorithm are used in the modeling and analysis of high-dimensional spectral data.Among them,the continuous CARS-PLS consensus fusion model has the best prediction performance on the data of Bayberry(near infrared spectrum),‘Yunhe’ pears(near infrared spectrum)and methanol gasoline(mid infrared spectrum),compared with the conventional PLS model,it improves the training set and prediction set by 15.3%,11.1%,15.1% and 14.6%,9.5%,10.3% respectively;while the consensus fusion model based on residual information has the best prediction performance on the methanol gasoline(Raman spectrum)data.which is 9.2% and 11.9% higher than the conventional PLS model in the training set and prediction set,respectively.The main research contents are as follows:First,according to the basic modeling strategy of the traditional univariate model,the consensus fusion strategy is used to improve the traditional univocal model based on the feature variable selection algorithm and the unsupervised clustering algorithm,and four main consensus fusion models are established for high-dimensional spectral data.Second,in order to test the modeling effect of the four improved consensus fusion models,the visible near infrared spectroscopy of‘Yunhe’ pears,the near infrared spectroscopy of bayberry,and the mid infrared spectroscopy and Raman spectroscopy of methanol gasoline collected in the actual production are used as the research objects.The traditional PLS model and multivariable feature selection model are introduced as basic reference models.The experimental results show that in the modeling and analysis of high-dimensional spectral data,the consensus fusion strategy has improved the prediction performance of the PLS model to some extent,and the consensus fusion strategy can enhance the robustness and reduce the model’s overfitting,thereby realizing the analysis and mining of the high-value feature variable information hidden in the high-dimensional spectral data. |