Font Size: a A A

Research On Key Issues Of Metabolomics Data Mining

Posted on:2012-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2234330392951800Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Metabolomics which was developed in the last century, followinggenomics, metagenomics, and proteomics etc., is a newly interdisciplinaryscience and is a vital part of systems biology. In recent years, it has quicklybecome a hot topic in life science research.Similar to other x-omics, metabolomics is a highly interdisciplinary arearelated to analytical chemistry, chemometrics, physiological pathology andother fields. The complete workflow of metabolomics include mainly samplecollection, pretreatment, data collection, data analysis and data interpretationbased on instrumental analysis and data mining platform. With the rapidimprovement of typical analytical instruments, the amount of data growsexponentially and so the difficulty of data mining increases dramatically. Thisdissertation introduces metabolomic data mining process and studies onseveral key issues. The main contents are as follows:[1] The accuracy, stability, prediction ability, and over-fitting of fourtypical classifiers were compared and evaluated comprehensively based onthe liver cancer and colon and rectal cancer datasets attempting to providesome instruction to the selection of metabolic profiling analysis method. Theemployed classifiers are PLS (Partial Least Squares regression), LDA (Lineardiscriminant analysis), SVM (Support vector machines), and RF (RandomForests). Classifier performance on feature selection and ranking were testedas well.[2] Machine learning methods and Pearson correlation coefficient wereused for optimal metabolomics experiment condition selection. Two data setsof methodologies were involved derived from the optimization experiments ofthe extraction reagents and the derivatization conditions.[3] Two special software, in MATLAB and with GUIs, for metabolomic data mining were developed.One is developed for multi-dimensional statistical analysis ofmetabolomic data containing three data preprocessing methods, fourclassifiers and several model evaluation approaches. The evaluationapproaches include cross validation, permutation, and ROC. The software iseasy to operate with friendly interface.The other is for standard curve fitting and batch quantitation ofmetabolites. This software uses automatically the intensity information ofmultiple compounds acquired from high-throughput analytical instruments tobuild the standard curve for every compounds and then calculate theirconcentrations by using these curves. The tool is beneficial to lighten theburden of researchers on repetitive, time-consuming and labor-intensiveoperation and is helpful to improve the efficiency.
Keywords/Search Tags:metabolomics, data mining, machine learning, batch quantitation
PDF Full Text Request
Related items