| With the rapid development of network technology,the types of traffic data and applications in the Internet have also increased dramatically.Therefore,how to effectively identify various types of network traffic in network management control has become a key issue.The identification of each type of network traffic is first of all a classification of network traffic data.In practical problems,when classifying network traffic data,attention should be paid to the imbalance of application types in network traffic.In the imbalanced network traffic,most types of network traffic data are generally used by users of multiple application types.Accurate identification of network traffic can help network operators provide better quality of service.Effective identification of minority types of network traffic can be used for discovery Detection of equipment failures,abnormal traffic,and virus intrusions and malicious attacks to improve network security.Firstly,the effect of different types of training sets on the classification of unbalanced network traffic is analyzed.In this paper,the SMOTE + Tomek Link resampling method is used to sample the original data set,and 7 balanced and unbalanced data sets are used as its training set,and it is classified using the XGBoost algorithm.The impact of classification results of balanced network traffic.The experiments were performed on the test set and the validation set.The results show that the classification model obtained from the uneven training set with the same proportion has little effect on the classification result of the uneven network traffic.The classification model obtained from the balanced training set does not reduce the overall classification accuracy.The accuracy rate and recall rate of a few categories can be improved,and the classification effect is better.Secondly,the network traffic data has more characteristics.In order to improve the performance of the classifier,this paper proposes a feature selection method based on the chisquare method and symmetric uncertainty,which selects the features in the uneven network traffic data and balances them.Six experiments were performed on the training set to obtain 6 features,which were classified using the XGBoost algorithm.The classification performance was improved and the classification results were improved.Finally,this paper uses KNN,SVM,and C4.5 decision trees to compare the three classification algorithms with XGBoost algorithm.The feature selection method and XGBoost classification model are used to establish a classification model for unbalanced network traffic data.After optimization,it can be obtained through experiments that the classification model obtained by using the optimized parameter XGBoost algorithm and the feature selection method can significantly improve the classification results of unbalanced network traffic. |