Research On Key Issues Of Metabolomics Data Mining

Posted on:2012-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:2234330392951800

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

Metabolomics which was developed in the last century, followinggenomics, metagenomics, and proteomics etc., is a newly interdisciplinaryscience and is a vital part of systems biology. In recent years, it has quicklybecome a hot topic in life science research.Similar to other x-omics, metabolomics is a highly interdisciplinary arearelated to analytical chemistry, chemometrics, physiological pathology andother fields. The complete workflow of metabolomics include mainly samplecollection, pretreatment, data collection, data analysis and data interpretationbased on instrumental analysis and data mining platform. With the rapidimprovement of typical analytical instruments, the amount of data growsexponentially and so the difficulty of data mining increases dramatically. Thisdissertation introduces metabolomic data mining process and studies onseveral key issues. The main contents are as follows:[1] The accuracy, stability, prediction ability, and over-fitting of fourtypical classifiers were compared and evaluated comprehensively based onthe liver cancer and colon and rectal cancer datasets attempting to providesome instruction to the selection of metabolic profiling analysis method. Theemployed classifiers are PLS (Partial Least Squares regression), LDA (Lineardiscriminant analysis), SVM (Support vector machines), and RF (RandomForests). Classifier performance on feature selection and ranking were testedas well.[2] Machine learning methods and Pearson correlation coefficient wereused for optimal metabolomics experiment condition selection. Two data setsof methodologies were involved derived from the optimization experiments ofthe extraction reagents and the derivatization conditions.[3] Two special software, in MATLAB and with GUIs, for metabolomic data mining were developed.One is developed for multi-dimensional statistical analysis ofmetabolomic data containing three data preprocessing methods, fourclassifiers and several model evaluation approaches. The evaluationapproaches include cross validation, permutation, and ROC. The software iseasy to operate with friendly interface.The other is for standard curve fitting and batch quantitation ofmetabolites. This software uses automatically the intensity information ofmultiple compounds acquired from high-throughput analytical instruments tobuild the standard curve for every compounds and then calculate theirconcentrations by using these curves. The tool is beneficial to lighten theburden of researchers on repetitive, time-consuming and labor-intensiveoperation and is helpful to improve the efficiency.

Keywords/Search Tags:

metabolomics, data mining, machine learning, batch quantitation

PDF Full Text Request

Related items

1	Research On Data Mining Of Blood Glucose Spectrum Based On Machine Learning
2	A Research On Rapid And Accurate Analysis Technology Of Trauma Metabolomics Based On Machine Learning
3	Study Of Prestigious Traditional Chinese Physicians' Medication Rules Based On Machine Learning And Data Mining
4	Application Study Of Data Mining In Intelligent Identification Of Metabolic Syndrome In Physical Examination Population
5	Establishment Of Data Mining Model And Its Application In Literatures Of Chinese Medicine
6	Research On The Prediction Of Drug Targets Based On Imbalance Data Mining
7	Application Of Machine Learning Classification Algorithm In Diagnosis Of Liver Disease
8	The Study Of HIV/AIDS Prediction And Control Model In Xinjiang Based On Data Mining Technology
9	Research On Medical Data Classification Algorithm Based On Machine Learning
10	Breast Cancer Analysis And Predictive Diagnosis Based On Data Mining