| In recent years,the Internet has continued to grow while facing serious security threats.Network intrusion detection technology can play a huge role in protecting net-work security by detecting various attacks in time.However,with the increasing num-ber of network traffic features,the redundant and irrelevant features contained therein severely limit the performance of intrusion detection systems.Feature selection can remove redundant and irrelevant features from data to reduce data dimensionality and improve model classification performance.However,for filter and embedded feature selection methods,the number of selected features depends on empirical knowledge to manually set the threshold or ratio; the wrapper feature selection method can get feature subsets directly,but the algorithm has high time complexity.At the same time,different feature selection methods have a certain bias in the evaluation of features.Although the existing ensemble methods can reduce the bias,they do not consider the performance difference of different methods,which leads to the unsatisfactory classification effect of the ensemble methods.To address the above problems,this thesis investigates the fea-ture selection problem in network intrusion detection systems and designs two effective feature selection methods.The main researches of this thesis are as follows.(1)An automatic feature selection method based on extremely randomized trees is proposed to address the problems that feature selection methods require empirical knowledge and a time-consuming search for subsets.Firstly,the importance of each feature is obtained on the dataset by training the extremely randomized trees algorithm and ranked in descending order of importance.Then the features are added to the sub-sets to be evaluated in turn and the classification accuracy of the subsets is calculated.To reduce the number of evaluation subsets,the subsets to be evaluated that have higher classification accuracy than the original feature set are only retained.Finally,a normal-ized score of mixed is designed to evaluate the performance of different subsets com-prehensively,and the subset with the highest normalized score of mixed is taken as the final selected feature subset.The effectiveness of the method is verified by comparing it with other feature subset selection strategies on the UNSW-NB15,CIC-IDS2017,and CSE-CIC-IDS2018 datasets.(2)Although the ensemble feature selection method can reduce the bias of differ-ent methods,it does not consider the performance difference in the classification of the subsets selected by different methods,which leads to the high complexity of the en-semble method.In this thesis,an elite ensemble automatic feature selection method is proposed.Firstly,five classical feature selection methods are improved based on the automatic feature selection strategy and a subset of features selected by each method is obtained.Then the elite strategy is used to select the two subsets with better clas-sification performance among the five previously obtained feature subsets.Finally,the boolean combination strategy is used to integrate the two obtained feature subsets by taking the intersection and the union of these two feature subsets respectively,and the intersection and the union are used to train the classification model.In this thesis,we validate the effectiveness of the method on the UNSW-NB15,CIC-IDS2017,and CSE-CIC-IDS2018 datasets,and the experimental results show that the feature subset selected by the elite ensemble method can improve the performance of the classification model. |