Font Size: a A A

Research On Phishing Detection Based On The Link Features Of Website

Posted on:2020-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:H P YuanFull Text:PDF
GTID:2428330596995048Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Phishing is the attempt to spoof users to leak their sensitive information,such as usernames,passwords,bank accounts,and credit card numbers,etc.According to the APWG Global Phishing Survey,phishing attacks are growing wildly,making people increasingly concerned about how to prevent phishing attacks.In order to deal with this phishing attack,this paper proposes two solutions.The first method is proposed after analyzing some shortcomings of the traditional techniques for phishing detection,such as: they need to analyze a large number of web pages,which leads to large time expenditures that cannot meet the requirements of realtime detection.Other detection technologies require the use of third-party services,which may invalidate the test results due to anomalies in these services.Therefore,the first method of this paper only analyzes the features of URL and some features of the internal links of the first layer of webpage,and then uses the machine learning method to classify the website as a phishing or legitimate.In addition,this method combines the meta-word search of the search engine.Quickly identify the phishing website and find the target more accurately.The second method is to consider that some phishing websites may do anti-crawling mechanism so that some detection technologies cannot obtain web content.In addition,a large number of analysis indicates that the domain name of the phishing URL generally uses these convenient and cheaply-priced domain names.However,the domain name of the legitimate URL is representative.Therefore,the second method only analyzes at the level of the URL.The method uses the word2 vec tool to automatically learn the vector representation of the URL using the deep learning method without manually designing the feature and then using the machine learning classification algorithm to classify the website as a phishing or legitimate.The method is evaluated on a data set of 1 million,which can achieve more than 99% on accuracy,and the detection speed is in the millisecond level.Compared with some traditional methods,both the accuracy and the speed are greatly improved.
Keywords/Search Tags:Phishing Website Detection, Machine Learning, Character Embedding, Phishing Targets, Meta-Words of Search Engine
PDF Full Text Request
Related items