Font Size: a A A

Real-time Detection Technology Of Malicious URL Based On Machine Learning

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:P P YangFull Text:PDF
GTID:2348330542498748Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data nowadays,the issue of network security in the world is constantly intensifying and many network applications suffer various forms of security threats and network attacks.As traditional security detection devices become increasingly unmanaged in the face of new attacks,applying artificial intelligence and data science in the field of network security becomes one of the hot topics currently.This paper studies the existing URL detection technology and applies the machine learning algorithm to the URL attack behavior recognition.A multi-classification model of malicious URL based on machine learning is proposed.Firstly,the multi-class training set is obtained using the WAF rule.Then the grammatical features and domain features are extracted as features vectors to input the different classification algorithms.The performance of the multi-classification model is verified by cross-validation.Random forest model is the best one.Finally,compared with other methods,the feature extraction method proposed in this paper is better than BoW and TF-IDF algorithm,which performs well in detecting many kinds of attacks in URL.Owing to the difficulty of acquiring a large number of labeled samples in the actual production environment and the need of iterative updating of the detection model,an improved semi-supervised algorithm based on Co-Forest is also proposed in this paper.According to certain constraints,high confidence samples and low confidence samples can be processed at the same time to change the shortcomings of the previous uncooperative low-confidence samples in co-training,which makes full use of information of low confidence samples.Compared with Co-Forest,the improved algorithm proposed in this paper has a better classification effect.Considering the big data size and the minimum detection delay in the actual security detection scenario,the malicious URL behavior classification model proposed in this paper and the improved Co-Forest algorithm are applied to the real-time detection system.The flow acquisition module,real-time flow calculation module and detection output module are designed and implemented.Moreover,Real-time detection system takes full advantage of Spark parallel computing.Therefore,data preprocessing,and feature extraction and model testing are achieved full parallelization.In order to verify the performance of the system,the system were finally tested and analyzed,and the accuracy of the detection system is above 98%.
Keywords/Search Tags:machine learning, URL multi-classification, Co-Forest, real-time detection
PDF Full Text Request
Related items