| Feature selection is the process of extracting the optimal subset of features from the basic feature set to reduce the dimensionality of the data.It is an important part of improving the classification accuracy of big data classification algorithms.Generally speaking,feature selection can be treated as an optimisation problem.For a feature set with n features,its search space is 2n-1 and the search for the optimal feature subset is proved to be an NP problem,so only an exhaustive search can guarantee to find the optimal feature subset,and when the feature set is very large,an exhaustive search for the optimal search will require a huge overhead of When the set of features is very large,the exhaustive search requires huge computational resources,making it impossible to use the exhaustive search method to solve the feature selection problem.Currently,heuristics are widely used in feature selection problems.In addition,hybrid metaheuristics are one of the most popular trends in dealing with optimisation problems.The main contributions of this paper are as follows.(1)In this paper,each feature is mapped into a one-dimensional binary vector using the formula,and each vector contains two cells,the first cell holds the information of the feature number and the second cell holds the information of whether the feature is selected or not.Secondly,this paper further analyses the influence of the parameter adjustment factor r12(t) and the random number r3 on the performance of the SCA algorithm in the process of finding the best result from the perspective of the expectation value of the result of the search,which leads to a problem when SCA algorithm does not find the range where the optimal solution is located in the first iteration,then the later iterative search will not find the optimal solution either.(2)In this paper,a new feature selection technique derived from the standard sine cosine algorithm(SCA),namely MetaSCA,is proposed.on the basis of SCA,the golden sine segment factor is added to narrow the search area for feature selection.In addition,a multi-level conditioning factor strategy is used to coordinate the balance between global and local search.The introduction of the multi-stage adjustment factor allows the algorithm to adopt an alternating global search-local search-global search-local search search strategy for finding the best features,so that the global search and local search can be performed multiple times to ensure that the search space can be fully explored and to avoid the dilemma that the standard SCA can fall into local search late in the iteration.Further,this paper applies the improved MetaSCA technique to the selection of the optimal feature subset for feature selection.(3)In this paper,the MetaSCA algorithm is compared with the standard SCA algorithm,the particle swarm algorithm and the grey wolf algorithm on six commonly used test functions to verify the improvement of the standard SCA algorithm in terms of finding performance with the introduction of the multi-level adjustment factor strategy and the golden mean coefficient strategy.According to the plots of the four algorithms on different test functions,we can obtain that the MetaSCA algorithm proposed in this paper achieves better results in terms of finding speed and accuracy.Next,in order to verify the performance of the MetaSCA algorithm on feature selection for optimal feature subset selection,the following seven evaluation metrics were selected for evaluation in this paper:average fitness,worst fitness,optimal fitness,classification accuracy,average proportion of optimal feature subset,feature selection time,and standard deviation.Performance was evaluated on seven commonly used UCI datasets,and then the results obtained by MetaSCA were compared with those obtained by three algorithms:the Standard Positive Cosine Algorithm(SCA),Particle Swarm Optimisation(PSO)and the Whale Optimisation Algorithm(WOA).The results from the simulation data show that in most cases the MetaSCA technique has the greatest accuracy and the smallest subset of optimal features in feature selection for the UCI dataset. |