| Pinus koraiensis seeds,also known as northeast pine nuts,are mainly distributed in the forest areas of Changbai Mountain and Xiaoxing’an Mountains in Northeast our country,and become an important economic crop for increasing income in forest areas.The inner kernel of Pinus koraiensis seeds is rich in unsaturated fatty acids that are beneficial to the human body,so it is deeply loved by consumers.The supervision of food quality by our country’s quality supervision agencies also tends to be stricter,which promotes the development of non-destructive testing methods for pine nut quality.The maturity year,origin,and nutrient content of pine nuts are important properties that affect the edible value and breeding value of Pinus koraiensis seeds,but it is difficult to distinguish them by conditions such as appearance,weight,and texture.Traditional chemical testing methods for nutrient content are time-consuming,cumbersome to operate,and destructive to samples,making it difficult to meet the needs of production testing.Near-infrared spectroscopy has gradually become a popular method in the field of non-destructive testing in recent years because of its fast testing,convenient operation,economic and practical characteristics.In this study,using near-infrared spectroscopy analysis technology,because of the high dimensionality of spectral data and strong concealment of key features,a t-SNE-SVM Pinus koraiensis seeds identification model based on data dimensionality reduction was proposed,which solved the previous modeling.The problem of huge process calculation and time-consuming.The data is clustered while dimensionality is reduced,thereby strengthening the input features,reducing the difficulty of training,and improving the accuracy of model identification.Use SNV,first derivative,and S-G algorithms to preprocess the original spectrum,and perform data dimensionality reduction in various methods,cluster analysis,and compare the dimensionality reduction effects.Through data visualization and the output of clustering parameters,the comparison shows that the t-SNE method is reduced to two dimensions as a better dimensionality reduction scheme.At this time,the contour coefficient,CH index,and mutual information of the two classification data sets are 0.8200,2972.0127,0.8742 and 0.8222,1928.2249,0.8883,respectively;Finally,the dimensionality reduction result is used as input to establish a support vector machine correction model for the classification of year and place of origin.When the kernel function selects RBF,the value of K is 5,y are 82.54 and 57.33,respectively.The penalty coefficients are 383.12 and 507.37,respectively.The accuracy of the established t-SNE-SVM classification model can reach more than 97.5%.It shows that the t-SNE-SVM model can effectively identify the traits of Pinus koraiensis seeds,and the model has high accuracy and low computational complexity.Based on the qualitative detection of the quality of Pinus koraiensis seeds,the detection of the nutrient content of the inner powder of Pinus koraiensis seeds by near-infrared spectroscopy was studied.In order to dig deeper into the useful information hidden in the spectral data and further eliminate the noise signal,it is proposed to apply the wavelet transform(WT)known as the "mathematical microscope" to the spectral data and decompose and reconstruct it,thereby achieving data compression And noise reduction.Subsequently,feature extraction was performed on the obtained wavelet coefficients,and an UVE optimized by MC was proposed,which improved the utilization rate of intra-sample correlation and solved uninformed variables.Eliminate the problem that the algorithm retains more variables.Finally,the selected features were combined with partial least squares to establish the WT-MCUVE-PLS fat content regression prediction model.When multivariate scattering correction is combined with SG convolution smoothing for preprocessing,the Donoho threshold wavelet filter "bior4.4" is selected,and the MCUVE extracts the first 70 wavelet coefficients,the WT-MCUVE-PLS regression model is the best,compared with other models WT-MCUVE-PLS also showed better forecasting effects.Cross-validation root mean square error(RMSECV)and predicted root mean square error(RMSEP)are the smallest,which are 0.0098 and 0.0390,respectively.The coefficient R2 of determination of the quantitative analysis model is the largest,and the R2 of the calibration set and prediction set are 0.9485 and 0.9369,respectively.It shows that the WT-MCUVE-PLS regression model of near-infrared spectroscopy can accurately characterize the fat content of Pinus koraiensis seeds.WT and MCUVE optimize the feature quality of the model,and finally improve the accuracy of the model.Given the storage time,place of production and even some complex and non-quantifiable factors,the off-line quantitative analysis model is not ideal for different batches of samples.Based on the quantitative analysis model,this study proposes a recursive partial least squares(RPLS)online learning model based on online multiplicative scatter correction(Online Multiplicative Scatter Correction,OMSC)preprocessing,which realizes the Online update of the original detection model.The online model has the characteristics of dynamic and continuity so that it can be used for long-term modeling.The OMSC algorithm was first proposed to preprocess the new samples used to update the model,which solved the prediction error caused by the inability of baseline correction in the data set of the previous updated model.Subsequently,wavelet compression is performed to reduce noise,and the number of features selected by MCUVE is appropriately increased to expand the space for changes in the selected feature bands in the process of updating the model.Finally,combine the processed new sample data with RPLS to iterate out the final updated model.The results show that when the number of features selected by MCUVE rises to 100,the R2 of the prediction set is 0.8581,and the RMSEP is 0.0621.The R2 of the new prediction set and RMSEP of the original offline model are 0.7193 and 2.1174,respectively.It shows that this method not only saves computing time and reduces workload,but also shows a good prediction effect.The use of near-infrared spectroscopy detection to realize the storage period detection,origin identification and nutrient content detection of Pinus koraiensis seeds is the core of this research.In-depth research and an organic combination of chemometrics,machine learning,and online learning methods are carried out.A model for character identification and nutrient content detection for evaluating the quality of Pinus koraiensis seeds is established,and the online learning model is updated based on the offline model based on the relevant research of online learning.The qualitative and quantitative analysis,quality inspection,and online learning research of other nut products have certain application value.In the future,the online learning model will be combined with network technology to achieve the goal of online updates and upgrades of the near-infrared analysis model. |