| With the continuous development of the Web era,the network security environment faced by Internet users is getting more and more severe.Nowadays,the online account has exploded,so studying the risk status of the account is a question of practical significance.In this study,the account risk is mainly two aspects: For the user,whether the account has the risk of being stolen;For application services,it is a malicious account that attacks applications and spreads illegal information.Aiming at the problems of high false alarm rate of account risk identification and poor recognition of unknown abnormal behavior,this paper studies the account risk identification model.Firstly,the account risk identification needs to collect data first.The data collected in this article consists of two parts,user behavior data and application log information.The user behavior data has mouse behavior,keyboard behavior,account online time,page access order,etc.The application log information includes the IP address,domain name,page title,request protocol,browser related information,etc.The way of data collection is in the form of JS scripts,and the embedded scripts separate data collection and application services.The collected data is stored in the HDFS file system,and data statistics are performed through Map Reduce.Secondly,aiming at the problem of high false positive rate of stolen account risk identification,this paper proposes an account risk identification method based on machine learning clustering and classification algorithm.The clustering algorithm introduces particle swarm optimization algorithm to improve Kmeans algorithm and forms PSOKmeans clustering algorithm.The algorithm avoids the disadvantage that the original Kmeans algorithm is sensitive to initial values.Compared with other improved Kmeans clustering algorithms,the algorithm proposed in this paper has better clustering effect.Because the number of abnormal accounts is relatively small,the direct classification method has a poor effect on the category with less data volume.Therefore,the method of clustering and then classifying is adopted to identify the risk of the stolen account.After clustering,three classification algorithms,Naive Bayes,Decision Tree and Random Forest,are used to classify and identify account risks.The experimental results show that the random forest algorithm has the highest accuracy,reaching 90%.Finally,aiming at the unpredictable problem of the current malicious account status change process,an account risk identification method based on improved hidden Markov model is proposed.When the hidden Markov model parameters are obtained,the simulated annealing algorithm is used to improve the parameter learning process.Improved parameter learning algorithm can obtain better model parameters.The hidden state variables of the account are analyz ed by the observed variables of the account,so as to judge the state change of the account within a certain period of time.The experimental results show that the hidden Markov model can effectively predict the state change of malicious accounts,and the accuracy of malicious account risk identification can reach 80%.The research in this paper has important reference value and promotion for user account risk identification. |