Font Size: a A A

Research On Intrusion Detection Model Based On Feature Selection And Machine Learning Algorithm

Posted on:2024-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2558307100466204Subject:Data intelligent analysis and application
Abstract/Summary:PDF Full Text Request
With the rapid development of new technologies such as Big Data,Internet of Things and 5G in modern society,the large amount of network traffic has led to an increasing threat to network security.Network Intrusion Detection Systems(NIDS)are used to effectively detect various attacks and thus protect network resources from these attacks in a timely manner.In this paper,we study intrusion detection methods based on feature selection and machine learning algorithms,and the main results of the paper are as follows.(1)An intrusion detection model based on the improved Relief F algorithm is proposed for the problems of inadequate feature extraction,failure to consider the influence of feature weights and lack of accurate model classification in existing intrusion detection algorithms.Firstly,the improved Relief F algorithm is proposed by optimizing the calculation of feature weights of intrusion data,and then a feature relevance scale is established based on the Pearson correlation coefficient of features on this basis.Only one of the features with high correlation is retained to achieve the secondary optimization of the features,and finally the optimal feature subset is optimized using Decision tree(DT),K-nearest neighbors(KNN),Random forest(RF),Naive Bayesian(NB)and Support vector machine(SVM)classifiers are used to evaluate the classification performance and accuracy of the method.The experimental results on two datasets,NSL-KDD and UNSW-NB15,show that the method not only has better detection performance,but also effectively reduces the feature dimensionality and has a positive impact on the computational complexity of classifiers.(2)To address the problem that the traditional feature selection algorithm is not scalable in the field of intrusion detection due to the large amount of data,a feature selection algorithm model based on the combination of PCA feature dimensionality reduction and clustering is proposed.First,we use PCA algorithm to reduce the dimensionality of the network intrusion data set,and then use four indicators of information gain,information gain ratio,Relief F and symmetric uncertainty as the evaluation indicators of K-means clustering algorithm to eliminate the irrelevant features,and obtain a subset of features with more recognition power through two steps of feature extraction and feature selection.accuracy of the method is evaluated using five classifiers.The accuracy of the proposed method for different numbers of relevant features is tested using percentage criteria.Experimental results on two datasets,NSL-KDD and UNSW-NB15,show that this method can not only improve the efficiency of classification models in network intrusion detection,but also effectively improve the accuracy of classification models in detecting attack feature types.(3)For the current network traffic data is usually unbalanced data,training intrusion detection models with unbalanced network traffic data often fails to identify rare types of attacks.To address this problem,this study proposes an intrusion detection algorithm model based on a combination of feature selection and oversampling for both binary and multi-classification intrusion detection tasks.The Light GBM feature selection algorithm is first used to seek to select the most important features,then an oversampling technique is used to adjust the ratio between different attack categories before the classifier learns to make the network traffic data more balanced and thus beneficial for classification,and finally seven machine learning classification algorithms are used to detect attacks on the network intrusion dataset.To evaluate the classification performance of the model,tests were conducted on the NSL-KDD dataset.The test results show that in binary classification,the classification model achieves the highest detection rate of 99.82% by applying the random forest algorithm.And in multi-classification,the accuracy of the classification model can be further improved when applying the synthetic minority group oversampling technique(Random Over Sampler)to solve the class imbalance problem.
Keywords/Search Tags:weight optimization, imbalanced data, oversampling techniques, intrusion detection, classification
PDF Full Text Request
Related items