| Near-infrared (NIR) spectroscopy has been widely used in analyticalscience due to its fastness, accuracy, green and simplicity. Because it isdifficult to extract quantitative informations from NIR spectra,chemometric methods, such as multivariate calibration, have beenextensively used. Quantitative models between spectra and target can beobtained through multivariate calibration, and the model can be used toanalyze unknowing samples. In order to build robust NIR quantitativemodels, works on variable selection, consensus methods, outlierdetection and etc. were studied in this dissertation.A consensus orderd predictor selection method is proposed fordealing with the NIR spectra of complex systems. In this method, theconsensus result is used as the prediction result and the influence of threeweighted methods on the consensus result was investigated. The superiority of the proposed method was demonstrated throuth three NIRspectra datasets.A strategy for improving the performance of consensus methods inmultivariate calibration of NIR spectra is proposed. In the approach, asubset of non-collinear variables is generated using successiveprojections algorithm (SPA) for each variable in the reduced spectra byuninformative variables elimination (UVE). Then sub-models are builtfor each variable subset using the calibration subset determined by MonteCarlo (MC) re-sampling, and the sub-model that produces minimal errorin cross validation is selected as a member model. With repetition of theMC re-sampling, a series of member models are built and a consensusmodel is achieved by averaging all the member models. Since membermodels are built with the best variable subset and the randomly selectedcalibration subset, both the quality and the diversity of the membermodels are insured for the consensus model. Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method. Thesuperiority of the method in both accuracy and reliability isdemonstrated.Sparse partial least squares method builds multivariate calibrationmodels with selected informative variables. A modified sparse partialleast squares method based on covariance is proposed for NIRspectroscopic analysis. In the method, uninformative variables areeliminated in modeling process based on the convariance between thespectra and target. Compared with the conventional sparse partial leastsquares method, the proposed method is more parsimonious and accuracy.With three NIR datasets and a Raman spectroscopic dataset, the methodis proved to be a potential way to dealing with uninformative variables incomplex spectroscopic analysis.An outlier detection method is proposed for near infrared spectralanalysis. The method is based on the definition of outlier and the principle of partial least squares (PLS) regression, i.e., an outlier in adataset behaves differently from the rest, and the prediction result of aPLS model is an accumulation of several independent latent variables.Therefore, the proposed method builds a PLS model with a calibrationdataset, and then the contribution of each latent variable is investigated.Outliers can be detected by comparing these contributions. Three datasets,including three NIR datasets of gasoline, beverage and tobacco lamina,are adopted for testing the method. It is found that the quality of themodels can be improved after removing the outliers detected by themethod.Indirect modeling of trace components in real samples by use ofnear-infrared spectroscopy has gained much interest, because it mayprovide a rapid way for analyzing the industrial or agricultural products.Coupling near-infrared diffusive reflectance spectroscopy andchemometric techniques, a method for rapid analysis of four kinds tobacco specific N-nitrosamines (TSNAs) and their total content isstudied in this work. For optimization of the models, techniques forspectral preprocessing and variable selection are adopted and compared.It is found that removing the varying background and the correction ofthe multiplicative scattering effect in the spectra are important in themodeling, and variable selection can significantly improve the models.For validation of the models, the TSNA contents of independent testsamples and tobacco leaves harvested in different year are predicted.Consistent results are obtained between the reference contents byGC/TEA analysis and the predicted ones. Although the relative errors forsome low content samples are not so satisfactory, the method is apractical alternative for industrial analysis due to the non-destructive andrapid nature of the method. |