| With the increasing prevalence of the Internet and the adoption of big data,cloud computing,and blockchain technologies,information systems connected to cyberspace are facing unprecedented threats.Intrusion detection has been crucial in ensuring network security over the past few decades.However,there is a significant imbalance in the number of attack and normal samples in network security,posing a major challenge to intrusion detection research.This imbalance creates bias towards the category with a larger number of samples,making feature extraction more difficult and resulting in low accuracy of intrusion detection from the perspective of simulation and modeling.Existing intrusion detection techniques often rely on a single data sampling method and incomplete feature screening techniques,which limits the improvement of intrusion detection performance.Additionally,a single intrusion detection model can only detect specific attack classes,making it difficult to improve attack classification accuracy.However,the current model integration methods suffer from high coupling degree,reducing the stability and reliability of the integrated model.To address the aforementioned issues,this study proposes a two-stage intrusion detection method based on improved Light GBM(Light Gradient Boosting Machine)and enhanced K-means(K-means cluster analysis).This method intelligently combines the high detection rate of the former with the accuracy of the latter through an improved data pipline structure,resulting in significant advantages over the previous methods by overcoming the former’s high false positive rate and the latter’s large time overhead.Furthermore,a stable and reliable multi-integration model is proposed for attack classification,which utilizes a new weakly coupled integration method to combine the strengths of Light GBM,XGBoost(e Xtreme Gradient Boosting),and Cat Boost(gradient Boosting with Categorical features support).The effectiveness of the proposed model is verified through simulation experiments,proving their applicability to intrusion detection and attack classification tasks,particularly in unbalanced sample scenarios.The research innovations of this paper can be summarized in three main points.(1)This paper proposes a two-stage intrusion detection method(STG2P)that is based on the improved Light GBM and enhanced K-means algorithms.STG2 P enhances the output structure of Light GBM to improve the detection rate of the model in the presence of unbalanced samples.Moreover,this paper design a method to automatically determine and update the hyperparameters of K-means to improve the accuracy of STG2 P.Furthermore,we overcome the time-consuming defect of K-means by utilizing a data pipeline in STG2 P.(2)This paper presents a novel multi-integrated attack classification model(MIM)that combines Light GBM,Cat Boost,and XGBoost algorithms.MIM utilizes random oversampling,random undersampling,and shuffle techniques for data sampling,as well as an improved simulated annealing algorithm for feature reconstruction to mitigate the impact of sample imbalance on the model.Additionally,MIM employs a novel rule-based and priority-based weakly coupled model integration approach to effectively integrate the strengths of multiple models,resulting in superior accuracy.(3)This paper designs and implements an intrusion detection system based on ensemble learning that incorporates two model frameworks,STG2 P and MIM,into their corresponding modules.The system provides friendly visual interfaces for intrusion detection and attack classification tasks,and can effectively analyze system terminal log data and network traffic data. |