Font Size: a A A

Research On A Detection Method Of Malicious URL Based On Ensemble Learning

Posted on:2023-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZhaoFull Text:PDF
GTID:2568307094989599Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Internet technology is constantly updated under the impetus of technological innovation,which has led to the unprecedented development of the information industry.Internet surfing,online shopping and other behaviors have become inseparable from people’s daily life.While various websites continue to enrich people’s lives and bring convenience,many malicious websites have also emerged.Malicious websites refer to websites where criminals use relevant technologies to threaten users’ personal privacy information and property safety without the knowledge of website visitors.These websites have seriously damaged the harmony of the network environment,and also endangered the security of individuals and even the society.Malicious websites have the characteristics of strong deception,short timeliness,and quick updating.Therefore,some advanced methods are urgently needed to identify malicious websites,that is,to identify malicious URLs.In the field of malicious URL detection,blacklist solution and rule-based detection solution are used widely in the industry.For instance,Google Safe Browsing,Phis Tank and other projects.However,theses methods have their shortcomings especially the rule-based method is easy for malicious website developers to find loopholes to bypass the rules.In recent years,the use of machine learning methods to detect malicious URLs has become very popular.These methods have high accuracy and robustness,but machine learning models rely on feature design and model selection.Aiming at the above problems,the essay designed a malicious URL detection solution based on ensemble learning.The specific technical contribution is as follows:(1)In terms of data sets,a real data set was collected and summarized through crawler and other channels,including a total of 21,474 URL data.The data was newly collected,taking into account timeliness and authenticity.(2)In terms of multi-source features,it combines URL features,web page host features,and web page HTML source code features.The URL and HTML source code features do not depend on any third-party services and are lightweight,reducing the burden of model training.(3)As for models,The essay designed a Stacking Model,which uses four machine learning models: SVC,Ada Boost,GBDT and Na(?)ve Bayes as primary classifiers.Its pluralistic structure enables different machine learning models to complement each other and improve detection.Overall system performance.On the dataset used in this paper,this model achieves 97.80%accuracy.In the empirical evidence of a single machine learning algorithm,Naive Bayes is the best;in the comparison between the Stacking model and the single model,the evaluation indicators of the integrated model have achieved the best performance.Experiments show that the feature construction strategy and malicious URL detection method proposed in this paper are feasible.
Keywords/Search Tags:Malicious URL Detection, Web Security Multi-feature, Machine Learning, Stacking Model
PDF Full Text Request
Related items