Font Size: a A A

Two Feature Selection Algorithms Based On Mutual Information And Bayesian Optimization

Posted on:2019-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LiFull Text:PDF
GTID:2428330566483243Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the evolution of machine learning and big data,the data in this field has shown an exponential growth year by year,which is mainly reflected in the data volume and data dimension.Based on the traditional data mining method,not only the learning rate is low but also the accuracy of the algorithm.Feature selection is a common data dimension reduction method in the field of machine learning.In recent years,due to the rise of big data,feature selection has been unanimously sought in the industry.Feature selection refers to the selection of some features from the original feature set,the evaluation criteria make the classification or regression algorithm in machine learning achieve the best result.The advantages of feature selection are that it can reduce the feature dimension,prevent the dimensionality disaster and improve the computational efficiency.In addition,the result of classification or regression with feature selection is better,because most of the selected features are optimal.However,the existing methods are mostly based on the dependence of a single target T or the influence of Y on the Y relevance,complementarity and redundancy for feature selection.However,these methods are almost nonexistent considering the combination of features,on the other hand,the conventional method of parameter optimization of machine learning algorithms usually derives a penalty function,and then empirically or exhaustively adjusts the parameters to maximize or minimize the penalty function,however,it is often encountered that there are many parameters,large amounts of data,and many features,an effective parameter optimization method is needed.On the basis of detailed reference to domestic and foreign data,we have mainly done two aspects of the feature selection algorithm.Firstly,we have a combination of feature selection algorithm based on mutual information.Since none of the existing methods take into account the combined features,such as attributes A and B have containd only a very small amount of information in Y,even completely independent of Y,A & B can provide a great deal of information about Y,or even completely determine Y.Based on this,a feature selection algorithm that can extract combination features and single features from feature sets is proposed.The insignificant features are combined and the new candidate features are generated according to the conditional probability distribution table.Then,maximum correlation and minimum redundancy criterion.Finally,experiment on virtual and real datasets respectively,the experimental results show that the feature selection algorithm can better mine the combined feature information of datasets,to a certain extent,improve the corresponding the accuracy of machine learning algorithms.Secondly,we have a new Bayesian optimal Xgboost algorithm.While using the Xgboost framework,it often involves the adjustment of various parameters,and the choice of parameter combinations has a great influence on the classification performance of the model.Traditional parameter optimization methods usually derive a penalty function and then adjust it by experience or exhaustive method parameter values to maximize or minimize this penalty function,but often encounter the case where a model does not have an explicit expression,for example "black box".The parameter optimization of this type of model is very troublesome.At the same time,it will give the algorithm some uncertainty and randomness.In this paper,the Bayesian optimization algorithm based on Gaussian method(GP)is used to optimize the parameters of Xgboost framework and a new algorithm GP_Xgboost is proposed,experimental results have shown that the improved algorithm in this paper is better than manual tuning and exhaustive method,which proves the feasibility and effectiveness of the algorithm.
Keywords/Search Tags:feature selection, mutual information, combination feature, Bayesian Opatimization, Xgboost
PDF Full Text Request
Related items