Font Size: a A A

Classification Of Metabolomics Data And Study Of Variable Selection Methods

Posted on:2019-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:B Y ZhangFull Text:PDF
GTID:2370330566483870Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The analysis of metabolomics data is an important part of metabolomics research.Because the complexity of output data brings great challenges to subsequent data analysis,how to accurately classify data and select robust biomarkers has great significance in metabolomics.The research content and results of these two aspects are as follows:On the one hand,this paper mainly studies the influence of data structure on the classification of metabolomics data.The data structure is discussed from three aspects: imbalanced ratio,that is to say,the ratio of the negative class to the positive class,data dimension and variable correlation.We used support vector machine,partial least squares discriminant analysis and random forest three machine learning algorithms to classify all the data which used in this section.The results show that these three aspects all have a great impact on the classification of metabolomics data,especially for the classification of the minority class samples has a very serious impact.On the other hand,we propose an algorithm(termed as SRS-SVM)that can be applied to the classification of metabolomics data to obtain the best classification accuracy and it is possible to screen out stable variables(or robust biomarkers)in metabolomics data.SRS-SVM is based on sparse regularization variable selection in combination with subsampling(SRS),and the classification is subsequently performed by a linear support vector machines(SVM)classifier in the selected-variables space to seek to the maximal classification accuracy.The result shows that SRS-SVM algorithm outperforms other related algorithms in terms of prediction classification accuracy measured by both internal and external validation.Furthermore,the selection of candidate biomarkers is quite stable by SRS-SVM and it can be an alternative method for analysis of metabolomics data.This is very significant in the application of metabolomics.
Keywords/Search Tags:metabolomics, class-imbalance ratio, dimension, variable correlation, sparse regularization with subsampling
PDF Full Text Request
Related items