| With the development of Internet and digital economy,network has become a part of people’s life.However,the frequent occurrence of network security incidents not only brings great losses to people’s lives,but also harms the network environment.Therefore,how to reduce people’s losses and create a safe network environment through network security technology has become the focus of attention.As a widely used network security technology,intrusion detection technology can detect intrusion behaviors to actively defend against network attacks and realize effective protection of network security.In recent years,machine learning has been applied in the field of intrusion detection,which has greatly improved the efficiency of intrusion detection.This thesis takes the intelligent intrusion detection technologies based on data reduction as research content and aims to improve the performance of network intrusion detection by using data reduction technique and oneclass classification algorithm.Specific research content and innovation points are as follows:(1)A mutual information feature selection algorithm based on redundancy penalty between features is proposed.To solve the problem of multiple features and redundant features in high dimensional data,a new mutual information feature selection algorithm(RPFMI)based on redundancy penalty has been proposed.Three factors are considered in the RPFMI algorithm:the correlation between features,the influence between selected features and categories,and the relationship between candidate features and categories.Thus,a new expression of the redundancy penalty between features and a new expression of the importance of features based on mutual information are proposed,and then the RPFMI algorithm is proposed.The experimental results show that RPFMI algorithm is an efficient feature selection method for intrusion detection.(2)The sample selection algorithm based on representativeness is proposed.To solve the problems of large amount of data and data redundancy in intrusion detection data,two sample selection algorithms based on sample representativeness,namely RBIS algorithm and RBIS-IM algorithm,have been proposed.Firstly,by analyzing three factors affecting sample selection:the influence of all samples of the same class on the selected sample,the influence of different categories of samples on the selected sample,and the influence of different categories of samples as a favorable factor,a new expression of sample representativeness is proposed to express the importance or representativeness of samples.Secondly,aiming at the problem of balanced data,the proposed RBIS algorithm selects a representative sample subset of the same proportion for normal samples and attack samples to improve data quality and reduce data scale,so as to improve the performance of intrusion detection.Aiming at the problem of imbalanced data,the proposed RBIS-IM algorithm selects the same number of normal samples and attack samples to improve data quality and solve the problem of data imbalance.Finally,in the benchmark data set of intrusion detection,RBIS algorithm can achieve better balance between accuracy and reduction rate compared with other algorithms.RBIS-IM algorithm can achieve better balance between balance accuracy and reduction rate.(3)An anomaly detection algorithm based on deep autoencoder and one-class neural network is proposed.Aiming at the problem of data information loss caused by data compression and the problem of the relationship between single classifier and autoencoder in the case of single classification,a new anomaly detection algorithm based on deep autoencoder and one-class neural network,DAE-OCNN,has been proposed.In order to compensate for the loss of information in the data compression process of deep autoencoder(DAE),a new data composition method has been proposed.The data consists of two parts,namely the compressed data and the multi-layer reconstruction error.In the reconstruction error,a new error distance is defined to express the reconstruction error part of the new data.A unified shared loss function is used to optimize the parameters of both the deep autoencoder and the one-class neural network.In other words,the deep autoencoder and the one-class neural network adopt coupling mode to improve the performance of intrusion detection. |