| The era of big data has brought about an increase in the amount of data.As data contains more value,it also places higher requirements on computer performance.Due to the lack of business and background knowledge,too many analysis variables lead to increase computing costs,which poses challenges to numerical analysis,sampling,combination,machine learning,and data mining.The dimensionality reduction of the data set will be a good solution,and the number of variables needs to be reduced while ensuring that the modeling effect remains the same.Facing with the problem of redundant data variables in a big data environment,this paper proposes a two-stage feature selection scheme using the Boruta feature selection dimension reduction algorithm as a sub-algorithm.Basing on the previous research on the dimensionality reduction algorithm,this paper takes the image data as the research object,and uses Boruta feature selection algorithm combined with intelligent optimization algorithm to model the MNIST data set and JAFFE data set.In the one-stage solution design,8 models such as random forest regression,random forest classification,and gradient boost classification were used as submodels of Boruta.The results show that using different sub-models shows different characteristics and is suitable for different types of data sets.The results are mixed.Then,a twostage heuristic optimization algorithm is designed on the basis of the one-dimensional dimension reduction.The use of the two stages significantly improves the prediction accuracy,and according to the characteristics of the second stage,an additional experimental stage is designed to reduce the entire solution by 5% to 30% again Variables.Finally,the stability coefficient index is designed to evaluate the stability of the model.The result showd that the two-stage Boruta feature selection algorithm helps to reduce the dimensionality of the data.On the basis of accelerating the modeling speed,it also improves the prediction accuracy of the model on the test set,and the stability is also improved than before the dimensionality reduction.It shows that the two-stage feature selection scheme proposed in this paper has strong feasibility and reliability.At the end of the article,a large amount of "star face" data was collected,and the dimensionality reduction scheme proposed in this paper was used to perform dimensionality reduction modeling and face recognition projects.The results show that the scheme is effective in more complex face data.A face recognition applet that can be used in the engineering or business field in the future.The innovation of this paper is reflected in the development of a two-stage feature selection scheme,combining the existing Boruta algorithm with an intelligent optimization algorithm,which can select the optimal dimensionality reduction scheme based on data with different characteristics,and at the same time it is established to evaluate The coefficient of model stability complements the existing dimensionality reduction theory and applications. |