As we know, the emerging of modern analytical instruments and the progress of computer technology have done much to promote the development of Analytical Chemistry and Life Science. Now, we can get a vast amount of data about the samples by the aid of the instruments which is high throughput, such as gene-chip, mass-to-charge ratios of mass spectrometry,and wavelengths of Near Infrared Spectrum or Raman Spectrum. However, it means that we will be confronted with a new problem:how to select informative variables from those large datasets and how to establish corresponding model to analysis and recognize?To propose a solution, we thought up a new method of variable selection,that is MPA-MMIFS. It was based on mutual information and combined with Model Population Analysis (MPA), where the relevance between the input variables and the response is maximized and the redundancy of the selected variables is minimized. Moreover, in order to adjust the variable importance, we also introduced in the regression coefficient of Partial Least Squares Linear Discriminant Analysis (PLS-LDA). Using three real world datasets (Gene expression data of Estrogen, Metabolomics data of Type2Diabetes Mellitus and Near infrared spectroscopy data of vinegar), the proposed method was tested to select variables to establish models, in the meanwhile, both cross validation (CV) and double cross validation (DCV) were used to assess the model.Comparing with other methods (MIFS, MMIFS and GA), the outcomes showed that the proposed method achieved competitive performance. |