Font Size: a A A

Research And Implementation On Joint Features And Intelligent Detection Algorithms Of Phishing Webpages

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:X P JiaFull Text:PDF
GTID:2428330545457853Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Phishing Webpage fraud is a major trick of criminal in the modern Internet world.In recent years,the number of webpage attacks has been rising significantly,and hit a record high in 2017.Attackers can deploy a webpage attack at the lowest cost,and allowing it to spread on a large scale in a short period of time.In order to protect the information security of Internet users,it is crucial to study more accurate and rapid automatic webpage detection methods to resist this fast-paced cyber attack.In this dissertation,the classification of phishing webpages was investigated using features derived from three sources: URL,web content elements as well as relative informations,and feature extraction,feature selection and feature importance calculation are performed on these features.In order to make the classification models express richer fine-grained description of web pages,the joint feature rate R(0<R<=1)was introduced to feature extension and combination of basic features.Based on these,a variety of basic classification models are implemented,and the capabilities of multiple models trained using different dimensional features in the detection of phishing webpages are systematically compared.First of all,the optimal parameter models were obtained by adjusting the parameters of multiple classification models,and compare the classification results of multiple optimal parameter models trained based on different joint features.Secondly,the optimal classification model was compared and determined from the respective optimal parameter models.In addition,the selected optimal parameter model is compared with the existing related research results.The results show that the random forest and neural network model has excellent detection effect.In this paper,an improved self-training method of semi-supervised learning was proposed.This method divides a large number of unlabeled datasets into multiple subdatasets on average,and sequentially trains the classification models on these subdatasets.The detection accuracy rate and the running time of the four common classification models in the improved self-training method were compared.Compared with the traditional self-training method,the improved self-training method can detect phishing webpages effectively,and also on the basis of ensuring that the classification effect is equal to the traditional one,and the running time of the method is reduced by more than 50%,which provides a new idea for solving the lack of large scale data with reliable label and online detection.
Keywords/Search Tags:Phishing webpage detection, Machine learning, Joint feature, Optimal classification model, Self-Training
PDF Full Text Request
Related items