| With the rapid development of Internet technology,network security issues have become more complex and changeable.In the power information system,various intrusion means threaten the information network environment,and network security intrusion detection can dynamically detect the security status of power information network as a whole,which plays a vital role in the maintenance of power information network environment,The application of data mining technology creates opportunities for large-scale power information network security intrusion detection research.At present,there are two problems in the power information network intrusion detection method based on machine learning.The first is that redundant data and irrelevant features in the power information network intrusion detection data set will lead to the decline of the precision of network intrusion detection.The second is that the power information network intrusion detection data is usually unbalanced data,This will lead to a low recall rate for the classification of a few types of attacks.Therefore,this paper evaluated the four indexes of accuracy,recall,precision and F1_score to detect abnormal attacks in the information network,and mainly completed the following work:(1)The RF,GBDT,Ada Boost and XGBoost algorithms in machine learning were used to calculate the data set.After tuning and comparing,the advantages and disadvantages of each algorithm were analyzed,and the relatively good random forest algorithm was selected as the application of subsequent experiments.(2)Based on the analysis of the algorithm of data set,an OVO network intrusion detection algorithm based on recursive feature identification and principal component is proposed Analysis,RFECV-PCA-OVO,which uses recursive feature elimination method based on random forest as judgment,reduces the number of features,improves the evaluation indexes of eliminating them,makes it more in line with the requirements of network data set detection,then reduces the data dimension by PCA,and finally uses one-to-one detection model to calculate.The experimental results show that the precision rate of a few attack types is shown The evaluation indexes have been improved.(3)Aiming at the problem of multi classification imbalance of data set,a hybrid method is used to deal with it.SMOTE(Synthetic Minority Oversampling Technique)algorithm is used to expand the data of a few attack samples.For the data expanded by smote algorithm,it is easy to produce the problem of distribution marginalization,so it is improved to combine with K-means algorithm,Finally,the model in the previous chapter is used for calculation and comparison.The experimental results show that the recall rate is improved and the model detection effect is further improved.Experiments on KDD CUP99 data set show that,compared with other models,the proposed model achieves better results both on the whole data set and on the classification of a few attacks. |