Font Size: a A A

Research On Feature Selection And Integration Method And It's Applications

Posted on:2022-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y L YangFull Text:PDF
GTID:2517306491977209Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet and information industry,the mining and application of big data has ushered in a new upsurge of development.The popularity of globalization,such as e-commerce,O2 O,intelligent logistics,finance and communication,makes terabytes of big data flood into our production and life explosively.With the emergence of more useful information,there is also a large amount of redundancy and complex data features coming.At this time,it becomes very important to find clean and useful features from the original data by means of dimensionality reduction and other methods.In this paper,considering the characteristics of feature selection and feature extraction,which are two classical methods of dimensionality reduction in academia,we choose the feature selection method to study and apply it to common classification problems in life.And the feature selection method has strong explanatory ability and without changing the original feature structure.First of all,since a single feature selection method is not enough to meet our requirement of improving classification accuracy,this paper combines the advantages of ensemble learning algorithm and considers the use of "ensemble feature selection method".Based on the comprehensive analysis of the time complexity of the three feature selection methods of filter,wrapper and embedded,as well as the deficiency of ordinary computer computing power in high-dimensional data operation,we select three feature selection methods of filter: Variance feature selection method,mutual information feature selection method and chi-square feature selection method.They are used as the basic selectors,and combined with the weighted average method in ensemble learning to calculate the feature importance weights.Secondly,the proposed feature selection method,MI-VA-CH,was applied to the data set of UCI mushroom classification,and two commonly used feature selection algorithms,SVM-RFE and RF,with good performance,were compared.In terms of model training,seven classification models,KNN,SVM,RF,decision tree,XGBoost,Bayes-Gaussian,neural network,and voting ensemble learning method,were adopted in this paper for comparison.Finally,from the model prediction results,the best accuracy of the feature subset obtained by the integrated feature selection method in this paper in model training is0.9769,which is 27% higher than the accuracy of the contrast method RF and only2% lower than SVM-RFE.The stability of the feature subset is good,and the classification accuracy is stable and accurate in each single classification model.In terms of running time,the average running time of the MI-VA-CH feature selection algorithm is about 12 s,while the average running time of SVM-RFE and RF is 62 s and 35 s respectively.Considering the prediction accuracy of the model,the stability of feature subset and the time complexity,the integrated feature selection method proposed in this paper performs well.
Keywords/Search Tags:Feature selection method, Ensemble learning algorithm, MI-VA-CH, Hybrid model
PDF Full Text Request
Related items