Two Feature Selection Algorithms Based On Mutual Information And Bayesian Optimization

Posted on:2019-10-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Li

Full Text:PDF

GTID:2428330566483243

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

With the evolution of machine learning and big data,the data in this field has shown an exponential growth year by year,which is mainly reflected in the data volume and data dimension.Based on the traditional data mining method,not only the learning rate is low but also the accuracy of the algorithm.Feature selection is a common data dimension reduction method in the field of machine learning.In recent years,due to the rise of big data,feature selection has been unanimously sought in the industry.Feature selection refers to the selection of some features from the original feature set,the evaluation criteria make the classification or regression algorithm in machine learning achieve the best result.The advantages of feature selection are that it can reduce the feature dimension,prevent the dimensionality disaster and improve the computational efficiency.In addition,the result of classification or regression with feature selection is better,because most of the selected features are optimal.However,the existing methods are mostly based on the dependence of a single target T or the influence of Y on the Y relevance,complementarity and redundancy for feature selection.However,these methods are almost nonexistent considering the combination of features,on the other hand,the conventional method of parameter optimization of machine learning algorithms usually derives a penalty function,and then empirically or exhaustively adjusts the parameters to maximize or minimize the penalty function,however,it is often encountered that there are many parameters,large amounts of data,and many features,an effective parameter optimization method is needed.On the basis of detailed reference to domestic and foreign data,we have mainly done two aspects of the feature selection algorithm.Firstly,we have a combination of feature selection algorithm based on mutual information.Since none of the existing methods take into account the combined features,such as attributes A and B have containd only a very small amount of information in Y,even completely independent of Y,A & B can provide a great deal of information about Y,or even completely determine Y.Based on this,a feature selection algorithm that can extract combination features and single features from feature sets is proposed.The insignificant features are combined and the new candidate features are generated according to the conditional probability distribution table.Then,maximum correlation and minimum redundancy criterion.Finally,experiment on virtual and real datasets respectively,the experimental results show that the feature selection algorithm can better mine the combined feature information of datasets,to a certain extent,improve the corresponding the accuracy of machine learning algorithms.Secondly,we have a new Bayesian optimal Xgboost algorithm.While using the Xgboost framework,it often involves the adjustment of various parameters,and the choice of parameter combinations has a great influence on the classification performance of the model.Traditional parameter optimization methods usually derive a penalty function and then adjust it by experience or exhaustive method parameter values to maximize or minimize this penalty function,but often encounter the case where a model does not have an explicit expression,for example "black box".The parameter optimization of this type of model is very troublesome.At the same time,it will give the algorithm some uncertainty and randomness.In this paper,the Bayesian optimization algorithm based on Gaussian method(GP)is used to optimize the parameters of Xgboost framework and a new algorithm GP_Xgboost is proposed,experimental results have shown that the improved algorithm in this paper is better than manual tuning and exhaustive method,which proves the feasibility and effectiveness of the algorithm.

Keywords/Search Tags:

feature selection, mutual information, combination feature, Bayesian Opatimization, Xgboost

PDF Full Text Request

Related items

1	Research On Dynamic Feature Selection Algorithm Based On Mutual Information
2	Research On Feature Selection Algorithm Based On Mutual Information
3	Research On Mutual Information Based Feature Selection Method For High Dimensional Small Sample Data
4	Research On Mutual Information Based Feature Selection Algorithm
5	Improvement On Mutual Information In Feature Selection Based On Composite Ratio
6	The Research Of Bayesian Classifier And Its Applications
7	Study Of Feature Selection Method Based On Mutual Information
8	A Study On An Optimal Feature Selection Algorithm Using Minimum Joint Mutual Information Loss Criterion
9	Research On Feature Selection Algorithm Based On Lasso And Mutual Information
10	The Research Of Multi-label Feature Selection Based On Mutual Information And Feature Label Relationship