Research On A Detection Method Of Malicious URL Based On Ensemble Learning

Posted on:2023-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:C X Zhao

Full Text:PDF

GTID:2568307094989599

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Internet technology is constantly updated under the impetus of technological innovation,which has led to the unprecedented development of the information industry.Internet surfing,online shopping and other behaviors have become inseparable from people’s daily life.While various websites continue to enrich people’s lives and bring convenience,many malicious websites have also emerged.Malicious websites refer to websites where criminals use relevant technologies to threaten users’ personal privacy information and property safety without the knowledge of website visitors.These websites have seriously damaged the harmony of the network environment,and also endangered the security of individuals and even the society.Malicious websites have the characteristics of strong deception,short timeliness,and quick updating.Therefore,some advanced methods are urgently needed to identify malicious websites,that is,to identify malicious URLs.In the field of malicious URL detection,blacklist solution and rule-based detection solution are used widely in the industry.For instance,Google Safe Browsing,Phis Tank and other projects.However,theses methods have their shortcomings especially the rule-based method is easy for malicious website developers to find loopholes to bypass the rules.In recent years,the use of machine learning methods to detect malicious URLs has become very popular.These methods have high accuracy and robustness,but machine learning models rely on feature design and model selection.Aiming at the above problems,the essay designed a malicious URL detection solution based on ensemble learning.The specific technical contribution is as follows:(1)In terms of data sets,a real data set was collected and summarized through crawler and other channels,including a total of 21,474 URL data.The data was newly collected,taking into account timeliness and authenticity.(2)In terms of multi-source features,it combines URL features,web page host features,and web page HTML source code features.The URL and HTML source code features do not depend on any third-party services and are lightweight,reducing the burden of model training.(3)As for models,The essay designed a Stacking Model,which uses four machine learning models: SVC,Ada Boost,GBDT and Na(?)ve Bayes as primary classifiers.Its pluralistic structure enables different machine learning models to complement each other and improve detection.Overall system performance.On the dataset used in this paper,this model achieves 97.80%accuracy.In the empirical evidence of a single machine learning algorithm,Naive Bayes is the best;in the comparison between the Stacking model and the single model,the evaluation indicators of the integrated model have achieved the best performance.Experiments show that the feature construction strategy and malicious URL detection method proposed in this paper are feasible.

Keywords/Search Tags:

Malicious URL Detection, Web Security Multi-feature, Machine Learning, Stacking Model

PDF Full Text Request

Related items

1	Research On Malicious URL Detection Based On Machine Learning
2	Research On Multi-objective Restricted Boltzmann Machine Model For Malicious Code Detection
3	Research On Malicious Web Page Recognition Based On Feature Fusion And Machine Learning
4	Android Malicious Code Detection Based On Integrated Multi-feature
5	Research On Malicious URL Recognition Based On Machine Learning And Its System Implementation
6	Credit Default Detection Based On Deep Heterogeneous Stacking Model
7	Research On Malicious URL Detection Technology Based On Machine Learning
8	Research And Implementation Of Encryption Malicious Traffic Detection Technology Based On Sequence Multi-granularity Feature
9	Research On Detection Algorithm Of Extortion Software Based On Machine Learning
10	Research On User Malicious Comments Detection Based On Machine Learning