| Feature selection refers to the method of selecting the optimal feature subset from the original features that contain a lot of redundancy,irrelevance and noise,and restoring all the features in the original data set with the minimum number of features as much as possible.As an effective data dimensionality reduction method,this method improves the prediction,classification and computational performance of machine learning or data mining models by eliminating redundant,irrelevant and noisy features in high-dimensional data.In feature selection,the search process of feature subsets can be regarded as a combinatorial optimization problem,that is,the process of seeking optimal feature subsets in a limited feature space;and the evaluation process of feature subsets can be regarded as a classification problem,that is,the process of evaluating the optimal feature subset through the classifier algorithm.In order to effectively improve the performance of the feature selection method,this paper proposes an improved whale optimization algorithm and an improved K-nearest neighbor algorithm from the two aspects of feature subset search and feature subset evaluation.Then the improved whale optimization algorithm and the improved K-nearest neighbor algorithm are used as the methods of feature subset search and feature subset evaluation,respectively,and a wrapped feature selection method based on improved whale optimization algorithm and improved K-nearest neighbor algorithm is designed.Firstly,aiming at the shortcomings of slow convergence speed and weak global search ability of the whale optimization algorithm in the feature subset search,chaotic reverse elite individuals are introduced to enhance the initial population diversity,and the individual preference behavior and location adaptive update mechanism of whales is simulated by skewed distribution and nonlinear disturbance parameters,an improved whale optimization algorithm based on individual selection preference and location adaptive update mechanism is proposed.By comparing the optimization results of 9 meta-heuristic algorithms for 20 benchmark functions in 30-dimensional and 100-dimensional spaces,it shows that: under the same simulation experimental environment,the improved whale optimization algorithm can not only effectively balance the local mining ability and global exploration of the algorithm It has better stability and reliability when solving benchmark functions of different dimensions.Secondly,in view of the disadvantage of low classification accuracy of K-nearest neighbor algorithm in the evaluation of feature subsets,a weighted voting criterion is introduced,and simulated annealing algorithm is used to construct a similarity measure matrix M between samples is constructed to improve the importance of sample attributes in classification calculation,an improved K-nearest neighbor algorithm based on weight matrix M and weighted classification strategy is proposed.The experimental results of 6 classifier algorithms on 8classification datasets show that: under the same experimental conditions,the improved K-nearest neighbors algorithm not only has better classification performance,but also has better robustness in different datasets.Finally,using the improved whale optimization algorithm to optimize the improved K-nearest neighbor algorithm and use it as the feature subset search and evaluation method of the datasets,a wrapped feature selection method based on improved whale optimization algorithm and improved K-nearest neighbor algorithm is designed.The experimental results of 7 feature selection methods on 15 classification datasets show that: under the same experimental conditions,the designed feature selection method not only shows better performance in feature subset search,but also in feature subset evaluation showed better classification performance.Moreover,when the original dataset is processed by the designed feature selection method,it can not only quickly and effectively remove redundant and irrelevant features in the dataset,but also has important research significance for the later data engineering.The paper has 25 figures,17 tables,and 126 references. |