Font Size: a A A

Studies On Model Optimization And Model Transfer Methods Of Near Infrared Spectroscopy

Posted on:2014-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Y ZhengFull Text:PDF
GTID:1261330425480904Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
In order to overcome the drawbacks of near infrared (NIR) spectroscopy, such as low absorption intensity and overlapped bands, chemometrics methods are used to construct models to extract chemical information. For the purpose of improving the prediction ability, the models should be optimized by spectral pretreatment and variable selection. And in the aim of improving generality of the models, the models should be executed calibration transfer.On aspect of spectral pretreatment, this paper applied fractional order Savitzky-Golay differentiation to preprocess NIR spectra. The fractional order Savitzky-Golay differentiation is the generalization of ordinary Savitzky-Golay differentiation (integral order Savitzky-Golay differentiation) while the ordinary Savitzky-Golay differentiation is the special case of fractional order Savitzky-Golay differentiation at integral order. Similar as ordinary Savitzky-Golay differentiation, the fractional order Savitzky-Golay differentiation also obtains the parameters of polynomial by fitting the data in the window of spectra. Then, with the aid of Riemann-Liouville fractional calculus theory and the parameters of polynomial, the results of differentiation can be obtained by the linear combination of the data in the window. Without complex mathematical formula, the fractional order Savitzky-Golay differentiation can obtain the spectra differentiation results by multiplying a band diagonal matrix on the right of raw spectra. Three datasets including diesel, wheat and corn datasets were applied to test this method. The results showed that compared with ordinary Savitzky-Golay differentiation, the proposed method can obtain more details of spectra to obtain small values of and root mean square error of cross valudation (RMSECV) and root mean square error of prediction (RMSEP), especially for the non-chemical information containing viscosity, density and hardness.A new variable selection method called stability competitive adaptive reweighted sampling (SCARS) was proposed. In SCARS, variable is selected by an index of stability that is defined as the absolute value of regression coefficient divided by its standard deviation. SCARS algorithm consists of a number of loops. In each loop, the stability of each variable is computed. Then based on stability, enforced wavelength selection and adaptive reweighted sampling (ARS) is used to select important variables. The selected variables are kept as a variable subset and further used in the next loop. After running the loops, a number of subsets of variables are obtained and the RMSECV of partial least square (PLS) models established with subsets of variables is computed. The subset of variables with the lowest RMSECV is considered as the optimal variable subset. The performance of the proposed algorithm was evaluated by three NIR datasets:tobacco, corn and wheat datasets. The results show that the SCARS can supply the least RMSECV and RMSEP comparing with methods of Moving Window PLS (MWPLS), Monte Carlo uninformative variable elimination (MCUVE) and competitive adaptive reweighted sampling (CARS).Furthermore, the overfitting caused by variable selection was also explored. We applied variable selection methods including SCARS, CARS and MCUVE to select variables from dataset without classification information generated from randomly variables. To our surprise, for the dataset without classification information, the variable selection methods can still select some "good" variable combinations to separate "two classes" with "low" prediction errors. Furthermore, the prediction errors decreased with the number of raw variables ascending. In addition to classification, when the randomly variables without regression information were generated, SCARS still selected "good" variable combinations to obtain low prediction errors. In essence, the phenomenon that variable selection method can obtain "good" variable combinations from uninformative variables is overfitting. In order to research the causes and diagnostic methods of the overfitting problems, the tobacco dataset were used by adding uninformative data torawspectra at different ratios to generate simulated data. After the simulated data had been constructed, the data were divided into two parts:calibration set and independent test set. Finally, variable selection was executed to compare the variation paths of RMSECV for calibration set with the corresponding variation paths of RMSEP for independent test set. The results show that when the ratio values of uninformative data to spectra are small (equal to or smaller than0.02for noise data as uninformative data and equal to or smaller than0.1for randomly permuted spectra as informative data), the paths of RMSECV are similar as those of RMSEP. While the ratio values are higher than0.02for noise data as uninformative data and0.1for randomly permuted spectra as informative data, the paths of RMSECV are different from those of RMSEP. The comparison of the paths between RMSECV and RMSEP can be used to evaluate the effect of variable selection:the high similarity of two paths means variable selection is effective while low similarity means variable selection is ineffective.For calibration transfer, we proposed a new calibration transfer method which corrects informative components instead of full spectral. This method employs partial least square (PLS) method for vector to extract the informative components related to predicted property from raw spectra and then corrects the informative components based on spectral transfer such as canonical correlation analysis (CCA), direct standardization (DS) and partial least square for matrix (PLS2). The performance of this algorithm was tested by three batches of spectra:corn dataset, tri-component solvent dataset and dataset of dimethyl fumarate in milk. The results showed that the performance of correcting informative components can decrease errors significantly in contrast with those of correcting full spectra.
Keywords/Search Tags:NIR spectroscopy, fractional order Savitzky-Golay differentiation, SCARS, over fitting, calibration transfer based on informative components
PDF Full Text Request
Related items