Font Size: a A A

New Chemometric Algorithms For Knowledge Discovery From Complex Chemical Data

Posted on:2006-07-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:S E A L M H M D B L K T Full Text:PDF
GTID:1101360152970090Subject:Applied Chemistry
Abstract/Summary:PDF Full Text Request
The development of chemometrics is a sign showing that applied chemistry and analytical chemistry are entering the information age. The need for improved quantitative information in analytical chemistry and chemical technology requires the transformation of chemical measurement into informative results, i.e. extracting the useful information from the data obtained. Usually, extracting the informative results from the multivariate data may mean detecting the natural clusters or outliers in these data then searching for suitable classification tools for the future objects or design a calibration model representing the data set. However, these targets can not be performed easily especially with the complex multivariate data sets which can be obtained from most of the advanced chemical instruments and chemical plants. The present thesis introduces five new chemometric algorithms to enhance knowledge discovery from the complex multivariate data.The first algorithm serves the field of cluster analysis; it is named Bubble Agglomeration (BA). The algorithm deals with each data point as a centre of a bubble with a radius of r. All the bubble have the same size, each set of contiguous bubbles forms a natural cluster or a core. The algorithm gradually increases the bubble radius and consequently the number of adjacent bubbles. The number of cores of the expected clusters consequently decreases. The sparse data points are distributed into the cores obtained according to their distances from different cores. The optimum bubble radius is determined via the reliability curve. Two simulateddata sets and three real ones have been employed to validate the performance of the method. A comparison with the K-means cluster analysis showed satisfactory performance of the BA approach.Undoubtedly, visualization of the multivariate data set into two-dimensional space is a powerful tool not only to detect the nature clusters but also to extract all the information embedded in this data set. The second introduced algorithm in this thesis is a new multivariate data display approach based on PCA. The data points can be visualized in a two-dimensional space and simultaneously free from the constraint of using only the first two principal components. In such an approach all principal components carrying chemically important information could be fully utilized in the visualization process. The data are visualized using an (n+l)-side regular polygon based on n principal components embedding most of the chemical information. The proposed method has been applied on real chemical data sets; some of them could not be visualized successfully by the conventional PCA method. The obtained results indicated that the proposed method can display the chemical data set; especially the proposed methodology keeps the inner distances skeleton among the data points relatively better than the conventional PCA methods.The third algorithm in this thesis is in the field of classification; the algorithm is named multi-parturition genetic algorithm (MPGA) which can be invoked to classify the overlapped chemical data. The proposed algorithm firstestimates a linear discriminant function. Estimation of the linear discriminant function is achieved by using a genetic algorithm modified by two new proposed operators, namely, multi-parturition and decimation and orientated creation. Modifying the genetic algorithm improves the linear classification results and simultaneously diminishes the computational time. To circumvent the common encountered difficulty in classification of linearly inseparable chemical data sets, the optimized linear classifier is further modified by a complementary nonlinear classifier. The complementary nonlinear counterpart is performed by erection of half-hyperellipsoids over the linearly misclassified patterns. The proposed MPGA has been applied to classify seven real chemical data sets. Experimental results have indicated that the proposed MPGA could classify seriously overlapped data sets.The fourth algorithm improves the multivariate linear calibr...
Keywords/Search Tags:Bubble agglomeration, Cluster analysis, Polygon display, Two- dimensional visualization, Nonlinear classification, Multi-parturition, Piece-wise quasi linear modeling, Data splitting, Calibration
PDF Full Text Request
Related items