With the increasing popularity of the Internet,the function of Web server is becoming more and more perfect.Every day,large-scale Web user behavior data will be generated,and the data will be stored in the server.In recent years,the problem of abnormal behavior analysis of Web users is one of the important issues of network security in China,and it is also a hot topic of academic research.It is not only an important goal to control public opinion,but also an important means for various network platforms to maintain the security of network environment to analyze the abnormal behavior of Web users from large-scale data and identify the abnormal behavior category of the user.Abnormal behavior of Web users has always affected the security and stability of Web pages,so Web page managers need to analyze a large number of log records every day,and constantly spend time and resources to maintain Web pages.Domestic and foreign researches on abnormal behavior of Web users mainly adopt the model built by enterprise integrated log analysis software and various machine learning algorithms or integrated learning algorithms.Although log analysis software provides an effective program for log collection and analysis,it is cumbersome to install and configure the software depending on its architecture or configuration.At present,various machine learning algorithms are not deep enough in log analysis.For feature extraction,they only extract a part of key features without effectively removing irrelevant features,and fail to effectively identify the categories of abnormal behaviors of Web users.Existing ensemble learning algorithm models also have problems such as inadequate feature extraction,failure to effectively reduce feature dimensions and low associative feature combination degree.In order to solve the above problems,this thesis starts from the perspective of extracting the abnormal behavior characteristics of key Web users to improve the classification accuracy of abnormal behavior of Web users.Based on the concept of ensemble learning algorithm,a high precision model is built.The main contents include:First,optimize the mRMR algorithm.The standard normalized mutual information function is introduced to measure the redundant features so as to improve the sensitivity of the vector.The saliency function is introduced in incremental search for new features and the key instance set is substituted for the original instance set.Feature selection is carried out based on the correlation and redundancy between features to obtain a better subset of abnormal behavior features of Web users.Secondly,a two-layer mRMR-XGBoost model is constructed.The first layer of the model classifies the feature sets based on the optimized mRMR algorithm,and outputs high-quality feature subsets of Web users’ abnormal behaviors.In the second layer,a new CART decision tree is generated through feature combination of XGBoost algorithm,and the regularization term built in XGBoost algorithm is used to effectively prevent the occurrence of overfitting phenomenon.The greedy algorithm is selected to process the node division of the tree and generate multiple regression trees.The residual error of the k-1 tree is fitted by the KTH tree and finally an optimal tree is generated.Finally,in this thesis,three categories of data including sql security injection attack,network directory scanning attack and XSS injection are extracted from the A Realistic Cyber Defense Dataset(CSE-CIC-IDS2018)Dataset to form the experimental Dataset of abnormal behavior of Web users.Firstly,this thesis makes an experimental comparison of the mRMR algorithm before and after optimization.By comparing the convergence speed and accuracy of feature classification,it proves that the optimized mRMR algorithm has better effect in feature processing.Secondly,an experimental comparison is made between the XGBoost single-layer model and the mRMRXGBoost two-layer model.The results show that the mRMR-XGBoost two-layer model has higher accuracy in the analysis of abnormal behavior of Web users.Finally,the mRMR-XGBoost two-layer model and XGBoost two-layer model are compared and analyzed in the anti-noise experiment.The results show that the mRMR-XGBoost two-layer model has better anti-noise performance and stability in the application of Web user abnormal behavior analysis.Moreover,the mRMR-XGBoost two-layer model has a faster classification speed in the analysis of abnormal behavior of Web users.In conclusion,the mRMR-XGBoost two-layer model proposed in this thesis has better performance and robustness in the application of abnormal behavior of Web users. |