Font Size: a A A

New Chemometric Algorithms For The Analysis Of Multi-way Data Arrays In Analytical Chemistry

Posted on:2006-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G WangFull Text:PDF
GTID:1101360182470252Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
With the emergence of many hyphenated instruments, analysts can easily obtain very large volume of analytical data matrices, which consist of hundreds and even thousands data points. These data matrices or data arrays contain abundant chemical information including the number of chemical components, the pure spectra, chromatograms and contents of these components. However, it is a hard task to extract the above information from the data matrices composed of vast data points just by conventional data processing techniques. Analysts have to resort to chemometrics, which is a new sub-branch of chemistry and came forth 70's last century. As an interface of chemistry with mathematics, statistics and computer science, chemometrics designs and selects optimal schemes for chemical measurements and maximally extracts chemical information from the data. With the evolution of chemometrics, its methodologies enrich comprehensively the fundamental theory of modern analytical chemistry. Among the bulk of chemometric methodologies, multi-way data analysis in analytical chemistry is one of the most active areas with practical significance. Two-way and three-way data analysis has gained a wide interest in the resolution and calibration of multi-component systems. These multi-way data analysis approaches provide a promising tool for the direct analysis of the so-called "grey" and "black" analytical systems. Since the chemical data characterize chemical systems, incorporating priori chemical information into the chemometric algorithm has become an important trend in multi-way data analysis. The present thesis primarily involves the following aspects of multi-way data analysis in analytical chemistry: 1. Two-way data analysis (Chapter 1 to Chapter 3): A chromatographic peak located inside another peak in time direction is called an embedded or inner peak in distinction with the embedding peak, which is called an outer one. The chemical components corresponding to inner and outer peaks are called inner and outer components respectively. The ultraviolet-visible and near infrared spectra of chemical compounds are band spectra, while the mass spectra possess the feature of discreteness. If the inner and outer components give different signals on different measuring channels, it is possible that there exist selective channels that represent pure chromatograms. Based on this priori chemical information, the inner chromatogram projection (ICP) method is proposed for resolution of GC-MS data with embedded chromatographic peaks. ICP is capable of achieving satisfactory performance not affected by the shapes of chromatograms and the relative position of two components. It could be utilized to resolve any pattern of embedded chromatograms with mass spectroscope as a detector. In two-way data analysis, pure spectra are also referred to as pure variables. Subjected to any form of normalization, the two-way data points are located on a certain hyper-"spherical" surface with the vertices constituted by the pure variables. A rational resolution procedure, named vertex vector sequential projection (VVSP), for determining pure variables in two-way data is developed by making full use of the above geometry of two-way data. Since there commonly exist selective regions in the time direction, VVSP would definitely ascertain the pure variables one by one, and then refine them through an iterative optimization procedure. The proposed method is approved to be a competent tool for the resolution of two-way data. Additionally, VVSP does not require the ascertainment of feature regions and its principle and implementation are straightforward. For the determination of elution windows and patterns, the pure spectrum evolving projection (PSEP) method is proposed. PESP tries to find pure spectra or pure projected spectra and utilizes the evolving projection method to find the elution windows of the overlapping chromatograms component by component. PESP could locate the starting and ceasing elution points of all components; more importantly, it gives a direct indication of elution patterns. PSEP has been approved to be a useful tool in discovering the elution windows in two-way data. 2. Three-way data analysis (Chapter 4 to Chapter 8): The most important prerequisite for the three-way data analysis is that the data arrays should strictly follow the trilinear model. In order to improve the accuracy and reliability of the decomposition of three-way data contaminated by nonlinear data or outliers, the iterative reweighted parallel factor analysis (IRPARAFAC) is proposed. The basic assumption of the proposed method is that the residues corresponding to the data entities contaminated with large deviation are larger than those of others. Cosine function is used to decide the weight for each entity. The IRPARAFAC algorithm iteratively updates the weights with the improvement of the unmodeled residues. During the iterative procedure, the data entities with large deviations will be discovered gradually and assigned with small or even zero weights. Hence their influence on the chemical loading parameters can be gradually mitigated. TheIRPARAFAC algorithm provides a promising tool to qualitative and quantitative analysis of trilinear data array containing nonlinear data or outliers. The chromatographic shifting could hardly be avoided because the stability of both operator and the state of the instrument could not always be guaranteed from run to run. If the shifting is severe, the trilinearity is no longer satisfied. Aiming at solving the problem, the VVSP method is utilized to the analysis of three-way chromatographic data. The three-way data array is unfolded along a certain direction into one matrix and a multi-bilinear model is obtained. Then the VVSP method is utilized to select the pure variables and iteratively improve the fit of the data to the multi-bilinear model. The multi-bilinear model guarantees that the chromatograms in each sample could be resolved separately, which circumvents the difficulty of model deficiency caused by retention time shifts. The results of both simulated and real chemical data sets have demonstrated that the proposed method is more efficient than PARAFAC when the chromatographic shifts are very severe. If the chromatographic shifts are slight or subjected to adjustment, the three-way data could be regarded to be decomposable in trilinear domain. Thus the trilinear evolving factor analysis (TEFA) is proposed by making use of the trilinearity of three-way data and the evolving nature of chromatography. Comparing with the two-way matrix, the three-way data arrays provide a matrix on each point along the time direction. The superiority of higher dimension of three-way data arrays supplies one with the possibility of conducting the singular value decomposition (SVD) on each elution detection point, while a number of neighboring profiles are needed to perform SVD in two-way resolution. So the rank map of three-way data could be obtained by direct rank analysis of matrices on each time point. Provided the trilinearity is guaranteed, accurate eluting information could be obtained from the rank map. Additionally this method need not consider the selection of window size, which affects the selectivity and sensitivity in two-way case. From rank maps, selective regions could be determined as well as spectral and concentration profiles, then the coupled vector resolution (COVER) method can be utilized to resolve the chromatograms. The COVER method needs pure variables of two dimensions or at least one calibration sample to achieve the resolution. This requisite could be met by the information acquired from the trilinear rank map. TEFA-COVER realized the idea of resolving profiles component by component through the deduction of resolved components. As a result, it achieves the direct resolution of a "black" system.The number of components is also called the chemical rank of the three-way data. The determination of chemical rank is crucial to the decomposition of three-way data arrays. Thus the chemical subspace projection (CSP) method is proposed for the determination of chemical rank in three-way data arrays. The proposed method projects the unfolded three-way data to the chemically meaningful subspaces and determines the chemical rank by checking the length of the projected vectors. The proposed method is simple to use and can give accurate estimate of the component number in an ordinary three-way data array.
Keywords/Search Tags:Multi-way data analysis, Embedded peaks, Vertex vector, Trilinear model, Model deficiency, Retention time shift, Rank map, Chemical rank
PDF Full Text Request
Related items