Font Size: a A A

Metabolomic Data Mining By Wavelet Transform-Pattern Recognition

Posted on:2007-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:X J WuFull Text:PDF
GTID:2120360212980382Subject:Biochemical Engineering
Abstract/Summary:PDF Full Text Request
How to fully explore significant information carried over huge bytes of spectroscopic data is a great challenge to metabonomic research. Taking two genotypes of Arabidopsis thaliana-Co10 and C24 and their first generation progeny, Co10×C24 and C24×Co10 as example, this paper explored the practicability and feasibility of introducing wavelet transoform into bioinformatic research in the field of metabonomics. Conclusion could be drawn by comparing the results obtained before and after integration of wavelet transform with the Principal Component Analysis (PCA), Hierarchical Clustering Analysis (HCA) and BP Neural Network (NN), respectively.Combining capability of wavelet transform in noise reduction and information recovery at frequency domain with capability of principal component analysis in dimension reduction and sample relations visualization, this paper established the method WT-PCA. Result showed that selection of wavelet had impact on WT-PCA. With DB8 as wavelet, accuracy of WT-PCA in distinguishing four genotypes was 90.675% while result from PCA was 46.875%. WT-PCA discriminated the two hybrids with accuracy of 81.25% but PCA could hardly differentiate the two genotypes.This paper established the method WT-HCA, which highlighted objectivity of HCA result and virtue of WT. Intragroup distances had evident impact on the results of HCA. Ward algorithm offered better results than single, complete and average. With ward as introgroup distance, accuracy of HCA in distinguishing four genotypes and two hybrids were 84.375% and 75%, respectively while WT-HCA improved the two figures to 90.675% and 81.25%.Utilizing advantages of BPNN in non-linear learning and self-adaption and that of WT in optimization of inputs, the method WT-BPNN was established. A WT-BPNN model with five hidden layers built in Leave One Out (LOO) predicted unknown samples from the 4 genotypes with correction rates of 100% while BPNN with the same structure predicted with accuracy of 81.25%.
Keywords/Search Tags:metabonomics, metabolomics, wavelet transform, pattern recognition, principal component analysis, hierarchical clutstering analysis, artificial neural network
PDF Full Text Request
Related items