| At present,using data mining technology to detect outliers in big data has made many achievements.However,anomaly detection algorithms are still limited in many practical applications,such as high-dimensional data computing overhead,poor convergence,poor stability,poor interpretability of abnormal results.At the same time,when the overall indicators detect exceptions,how to solve the interdependence between features,and try to shorten the search path without missing,efficient and accurate positioning is a huge challenge.To solve the above problems,this thesis proposes a weighted iForest(isolation Forest)anomaly detection model based on stability update and a multi-dimensional anomaly root cause analysis positioning model based on improved UTC(Upper Confidence bound applied to Trees)algorithm.Taking the student data of a university as an example,the abnormal detection and abnormal root cause analysis of student behavior data are carried out,and the results of big data analysis are visualized,so as to realize the active management of campus safety prevention and timely intervene in events that threaten safety.The main research contents of this thesis include three aspects,as follows.Firstly,a weighted iForest anomaly detection model based on stability update is proposed.Johnson-Lindenstrauss method is used to reduce the dimension of data before anomaly detection,which solves the instability problem of iForest algorithm for high-dimensional data anomaly detection and reduces the computational cost.The anomaly detection model introduces the concept of weight to enhance the interpretability of the anomaly,and uses PSI(Population Stability Index)algorithm to monitor the stability of the model to update the model timely.Secondly,a multi-dimensional abnormal root cause analysis positioning model based on improved UTC algorithm is proposed.Based on the Monte Carlo search tree algorithm based on the upper bound of confidence interval,the multidimensional root cause analysis localization model is combined with Gibbs sampling algorithm and parallel adaptive discrete particle swarm optimization to enhance the search ability of the model in a wide range of search space.Secondly,the improved possibility score is used to judge the root,and the tree is hierarchically pruned according to the possibility score when constructing the search tree,which reduces unnecessary search and improves the search efficiency.Third,a portrait of abnormal behavior of college students are constructed.The weighted iForest anomaly detection model based on stability update and the multidimensional abnormal root cause analysis positioning model based on improved UTC algorithm are applied to the detection and root cause analysis of abnormal behavior of college students.According to the results of the model,the abnormal behavior portrait is constructed,and the data are visually displayed. |