The research work in this thesis focuses on new chemometric algoritms for multivariate calibration and the applications of two-way data analysis methods to chromatographic separation evaluation.The representiveness of training samples for multivariate calibration has been discussed and the concept of weighted sampling has been introduced to multivariate calibration. Due to the high-dimensionality and complexity of spectral data space and the uncertainty involved in sampling process, the representiveness of training samples in the whole smple space is difficult to evaluate and selection of representative training samples for multivariate calibration depends largely on experiential methods. If the training samples fail to represent the sample space, sometimes the predictions of new samples can be degraded. In order to solve this problem, a new algorithm for multivariate calibration is developed by combining optimized sampling and partial least squares (PLS), where the original training samples are non-negatively weighted and the complexity and predictivity of the model are considered simutaneously. Moreover, it has been proved that weighted sampling can be achieved by multiplying both the spectrum and concentration value of a sample by the same non-negative constant, which has made the computation of sample-weighted models much easier. Two real data sets are investigated and the results demonstrate that sample-weighted PLS models can improve the predictivity of a model when the representiveness of original calibration sample is poor.Based on particle swarm optimization (PSO) algorithm, a more flexible method for variable selection, variable weighting is proposed. We have revisited traditional variable selection methods and found that in such methods the variables included in the model are essentially weighted with ones and those excluded from the model are weighted with zeros. If continuous non-negative weights are allowed, the traditional variable selection is just a special case of variable weighting. Since the variable weights are determined to simultaneously optimize the training of calibration set and the prediction of validation set, variable weighting can be seen as an optimized rescaling of the variables in certain sense and therefore is more flexible than traditional variable selection methods. Results obtained from real data sets indicate that variable-weighted PLS (VW-PLS) can not only play the same role as variable selection but can also maintain the multi-channel advantage by including more variables in the model.A new machine learning method, stacked regression is improved and then introduced to multivariate calibration to achieve automatic and fast sepectral interval selection. Instead of traditional cross validation (CV), Monte Carlo cross validation (MCCV) is adopted in the improved stacked regression, which is then used to combine the regression models built on different spectral intervals. With the non-negative constraints of the cobination coefficients, the resulted combined model has the minimum root mean squared error of MCCV (RMSEMCCV), so the model is expected to have good generalizing ability and less risk of overfitting. Stacked regression can obtain the combination coefficients by non-negative least squares (NNLS) and spectral interval selection is achieved by setting some coefficients to be zeros. Moreover, because MCCV of a linearly combined model can be achieved by linearly combining the MCCV of the separate interval models, which is much simpler to compute, the computation of MCCV stacked regression is economical. The practicability of the proposed method is demonstrated by its applications to two real data sets.A new concept of data preprocessing for multivariate calibration, ensemble preprocessing is proposed. Because the raw near infrared (NIR) spectra are often influenced by factors such as backgrounds, baseline shifts and noise, it is necessary to preprocess the raw data properly in multivariate calibration. However, due to the complexity of NIR data and lack of prior information, to achieve the optimal data preprocessing is still trial and error and requires the experience of practitoners. Another disadvantage of traditional preprocessing methods is that any preprocessing method has the risk of information loss and might degrade the data in some aspects while improving the data in certain aspects. Moreover, models based on a single preprocessing method are sometimes instable for predicting new samples. To solve the above problems and achieve the automatic selection and optimization of preprocessing methods, an ensemble preprocessing method is developed by combining calibration models based on different preprocessing methods through MCCV stacked regression. Results obtained from real data sets demonstrate that compared with traditional preprocessing using a single method, ensemple preprocessing can lead to a more stable calibration model while maintaining or improving the precision of the model.Moving window partial least squares regression (MWPLSR) is introduced to calibration transfer to develop a stable and low-complexity global calibration model. When applied to new samples containing spectral variations not calibrated, the existing calibration model should be adjusted to avoid bias and serious error. MWPLSR can select concentration-correlated spectral intervals and reduce the complexity of the global calibration model. Investigation of two benchmark data sets has confirmed that global calibration model based on MWPLSR has the above advandages as expected and can achieve stable and reliable calibration transfer.The disadvantages of traditional chromatographic separation criteria based on chromatograms recorded by single-channel detectors are discussed. It is further pointed out that many of these problems are caused by lack of information concerning number of components, peak purity and overlap degree in the presence of seriously overlapped peaks. Then the applications of two-way chemometric methods to assessing chromatographic separation quality are reviewed and some important problems involved are discussed according to literatures and our research experience.A new chromatographic separation criterion, peak-purity weighted resolution (PPWR) based on rank graph is proposed. Compared with traditional separation criteria based on one-way chromatograms, the advantages of PPWR lie in the fact that it gracefully considers the information concerning number of components, peak purity and overlap degree, which is difficult to obtain from one-way chromatograms with serious overlaps. PPWR is applied to a simulated data set and a real chromatograhic system, indicating PPWR is indeed a reasonable separation criterion for seriously overlapped peaks and can reflect the overlap degree. Finally some important problems that might be encounted when using PPWR are discussed. |