Font Size: a A A

Study On Feature Selection Based On Maximum Weight And Minimum Redundancy

Posted on:2017-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZhangFull Text:PDF
GTID:2349330488958103Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology,data is showing an explosive growth at a fast rate. At the same time, the irrelevent and redundant information contained in the data is increasing. This situation will futher lead to more challenges for existing machine learning algorithms, so there is an urgent need for feature selection with better overall performance in accuracy and operational efficiency to adapt to massive data. This dissertation mainly focuses on feature selection for high-dimensional datasets.Firstly, considering the defect of feature weight measures based on different within-class and between-class divergence metric, we propose a new feature weight metric function,which has some generalized characteristics. And further discuss the relationship with the other feature weight measure based on within-class and between-class divergence in detail.Secondly, in order to overcome the problem of evaluation methods of relevance and redundancy lack of diversification and the best feature number being difficult to determine, we put forward a new feature selection method based on Maximum Average weight and Average Minimum Redundancy (MAWMAR). On the one hand, MAMWAR adopts the generalized measure of feature weight based on within-class and between-class scatter, it not only makes the selection process facilitate understanding and analysis, but also can be applied to the handling semi-supervised and supervised problem. On the other hand, though making a fractional programming model based on Maximum Average weight and Average Minimum Redundancy, MAWMAR not only make completely the explicit trade off most informative /least redundancy inside the feature subset, but also determine the optimal feature number. Experimental results demonstrate MAWMAR that obtain a smaller and more preferably feature subset than other feature selection methods and improve the prediction accuracy of the classifier.Thirdly, considering the problem of feature number the feature selection method based on Maximum Average weight and Average Minimum Redundancy is susceptible to, in this paper we introduce a feature selection method based on Maximum Total Weight and Minimum Redundancy(MaToWMiR). Compared with MAWMAR, MaToWMiR not only use generalized weight calculation method, but also reduce the influence of feature number. Experimental results show that MaTo WMiR not only can effectively remove irrelevant and redundant features, but also improving the performance of machine learning algorithms.Finally, in order to examine the scope of application for MAWMAR and MaToWMiR,this paper makes a comparative analysis to the models under the two methods. Firstly, the branch and bound method is used to solve the model under MAWMAR and MaToWMiR Method, though comparing the corresponding classification accuracy rate and the number of features selected, the results show the two methods have advantages in different sets of data, and analysis the main reasons affecting the result. Secondly, for low efficiency issues of the precision algorithm, this paper adopt genetic algorithm having high accuracy to solve the model, numerical experiments show MAWMAR and MaToWMiR obtain the corresponding advantages for data set having similar characteristics (for example, similar weight distribution and redundancy distribution margin) whether using Branch and Bound algorithm or Genetic algorithm.
Keywords/Search Tags:Feature selection, filter method, within-class scatter, between-class scatter
PDF Full Text Request
Related items