Font Size: a A A

Research On Mutual Information Based Feature Selection And Its Application On Metabolomic Data

Posted on:2011-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2120330332461297Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the great development of Human Genome Project, scientific research on the phenomenon of life has also been concerned a lot. It is not only the study of various phenomena and nature of life activities, but also the study of relationship between organisms, and between organism and environment. Among them metabolomics research allows people to understand the metabolic mechanism from metabolites and metabolic pathways, and find the cause of certain diseases, providing preventive and diagnostic treatment measures. The extraction of biomarkers is an important and difficult issue.In recent years, data mining technology is widely used data analysis method in data preprocess stages. In evaluation criterion of features, mutual information is able to measure the shared information between two variables. Compared to other kinds of criterion, mutual information is insensitive to the concrete values of the training data, which results in they are more robust and not easily affected by noise or outlier data. Nowadays, research experts in data mining did much research on mutual information. In this paper, on the basis of dynamic mutual information, the algorithm combined dynamic sample space with mRMR and proposed a new feature selection method. The feature with the largest weighted mRMR value among those which can improve the classification performance is preferred to be selected. In order to evaluate the algorithm, experiments of three popular feature selection methods on five public data sets on five classification models is carried out, and the parameters of accuracy and AUC values demonstrate the superiority of the method.Finally, the feature selection method was applied to rat and patient metabolomic liver cancer data sets to extract cancer biomarkers. Experiment results show that the classification models built by the selected features obtain quite good classification results and have no phenomenon of overfitting. Meanwhile, statistical analysis is done on part of the extracted features, and the results show that the features have differences in three stages of live disease and our algorithms got good effect, and the extracted features are of biological significance. Thus, it is a good feature selection method.
Keywords/Search Tags:Feature Selection, Mutual Information, Dynamic Sample Space, Metabolomics
PDF Full Text Request
Related items