| China is a big country with large population and wheat is one of the most important foods. It is a popular topic among countries as how to detect the chemical components of wheat and evaluate a number of indicators fast, efficiently and nondestructive. The development of nondestructive testing of wheat has been promoted by the demand of high quality wheat in many domestic industries. However, many existing near infrared (NIR) instrument are bulky and expensive which are not suitable for on-site analysis and on-line testing. Obviously, it is a great barrier for the promotion and application of NIR spectroscopy. Although the NIR measurement method of water, crushed grain protein, and whole grain protein have been recognized by the International Committee for Standardization, in the NIR data processing, the weak information of overlapping peak of NIR spectrum and the poor model transfer and universal ability of the calibration model are induced by the complexity of the sample. Therefore, NIR spectroscopy techniques and instruments are not widely used in the global scale. The ideas and methods in the modeling process are hot topics which require great efforts to explore. While the wheat NIR spectral data have the characteristics with complex components, high variations and uncontrolled natural sampling, therefore the NIR spectroscopy problem is eager to be solved. This paper covers the research of wheat nondestructive technology in such background. The main contents are as follows:1. Firstly, a brief introduction to the existing NIR instrument is introduced and the commonly used NIR spectral preprocessing methods and modeling methods are discussed in detail. Those spectroscopy pretreatment methods contain smooth method, derivation method, wavelet transform (WT) method, multiplicative scatter correction (MSC) method, standard normal variate (SNV) method, and orthogonal signal correction (OSC) method. Modeling methods include partial least squares (PLS) method, and support vector machine (SVM) method. The principles of modeling evaluation are given in this paper. Further, the NIR diffuse reflectance spectrum measurement system and NIR diffuse transmission reflectance spectrum measurement system has been designed based on the main testing methods of NIR spectrum of wheat currently. These systems include optical fiber coupling system and light source. As for the optical fiber coupling system, a discrete collection of double ring structure with distributed is designed based on optical fiber coupled with the use of hierarchical ring samples collected in the form of diffuse light. This optical fiber coupling system is based on the ideas of optical expansion and weighted average sampling distribution. The structure consists of19points of the directive reception and all the19points are divided into two layers distribution ring. Respectively, the angel of the first layer distribution ring with illuminated surface is30°(9points) and the second layer distribution ring with illuminated surface is60°(10Points). Each receiving point is optical fiber collection (In order to increase the receiver solid angle, it can be equipped with a coupling lens). The problems such as the small hole of the integrating sphere which exist the traditional structure sample and the sample attitude which have great impact on test have been solved. For the light source system, long-life halogen cup lamp as a light source is used. Condenser structure is designed to use reflex condenser with pre-collimator lens. The filters and condenser structure are also used to ensure the light is concentrated with good orientation and satisfactory radiation effects.2. For the wheat moisture model, based on the ideas of granular computing (GrC) and the ways of supervised learning, the feature extraction of NIR spectrum of wheat has been successively achieved with wavelet multi-scale decomposition. The representative wavelet coefficients have been selected to reconstruct the spectrum and establish the prediction model. The root mean square error of cross validation (RMSECV) of wheat NIR moisture prediction model decreased from0.4887in raw spectrums to0.2910, decreased by40.5%. It optimizes the model and improves the prediction accuracy of the model greatly.3. Commonly used variable selection methods are introduced. These methods include uninformation variable elimination algorithm (UVE), successive projection algorithm (SPA), and uninformation variables elimination algorithm with successive projection algorithm (UVE-SPA). As for the model for protein in wheat, continuous wavelet transform (CWT) and MSC are adopted to preprocess the raw spectrum. Variable selection results in different variable selection methods are carefully analyzed. The variable selection method based on latent projection graph (LPG) is introduced, and the processing steps are given in detail. Further, the modeling results are discussed in this paper and all the discussions are based on SVM model, CWT-SVM model, CWT-MSC-SVM model, CWT-MSC-UVE-SVM model, CWT-MSC-SPA-SVM model, CWT-MSC-UVE-SPA-SVM model, CWT-LPG-SVM model and CWT-MSC-LPG-SVM model. The relevant modeling evaluation has been given. Within all the models, CWT-MSC-LPG-SVM model works best. The number of variables reduces by90%, the root mean square error of prediction (RMSEP) reduces by34%. Prediction accuracy of wheat protein prediction model is greatly enhanced.4. The idea of Model Population Analysis (MPA) has been elaborated. The MPA is based on the collected samples on the first hand and then used Monte Carlo sampling technique (MCS) to divide them into sub-datasets. In this paper, all the collected93wheat samples are used to establish500sub-datasets by MCS. Further, the sub-models are created for each sub-datasets. The LPG method and PLS method are used in modeling. The500RMSEP of sub-models are concluded. Finally, the discussion has been given in the sample space, variable space, parameter space and model space respectively. Thereby the information of interest could be selected by statistic analysis in these spaces. In this paper, the500root mean square error (RMSE) has been statistical analyzed. There are42sub-models which have large RMSE are deleted. In the remaining458sub-models, the variables selected are used for statistical analysis. There are12variables with high frequency as characteristic variables. Comparing and analyzing the modeling results of different variables selecting methods. Wherein, CWT-MSC-MC-LPG-PLS model based on the idea of MPA creates a95%reduction of variables. Model accuracy is improved by51%. It can be better used in NIR spectral modeling of wheat protein prediction. |