Font Size: a A A

Research On Phishing Website Hierarchical Detection Based On Webpage Features

Posted on:2020-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:2428330590452094Subject:Information security
Abstract/Summary:PDF Full Text Request
Phishing is the practice of trying to trick people into giving secret financial information by websites that look as if they come from a bank,credit-card account,etc.Attackers elaborate phishing websites that similar to legitimate sites trick users into entering sensitive information and take illegal profits.A phishing site has an entrance,layout,and content so similar to a legitimate site result in users often don't notice.With the rapid development of the Internet and e-commerce,network economic activities have become very common.At the same time,phishing websites become increasingly rampant and cause more and more serious economic losses,which is an urgent social problem to be solved.Therefore,it is imperative to solve the problem of phishing.Many achievements have been made in the research of fishing detection and many detection techniques and solutions have been proposed by researchers.However,attackers are also constantly updating technical means to enhance the anti-detection of phishing websites,which leads to the inherent detection technologies cannot hit the new phishing websites perfectly.Phishing websites generally have a short survival time,and the traditional phishing detection technology cannot achieve the ideal real-time detection from multiple levels and angles.Therefore,this paper studies the detection technologies of phishing websites and proposes a hierarchical detection method for phishing websites based on webpage features.The main contents of this paper are as follows:(1)The phishing website detection algorithm based on logo similarity is proposed as the first layer detection method.This paper proposes to establish the logo black and white list and implement the similarity detection for the logo,in which the logo black and white list serves as the data basis for detection and is convenient for new phishing detection in the future.Firstly,the logo color is divided and preliminarily tested.In the case of big data,the retrieval and matching efficiency is greatly improved.Then,the logo with the same color is detected by survey line to achieve a relatively lightweight detection with high efficiency and accuracy.In addition,the parallel processing scheme of the algorithm is designed in the cloud environment to improve the data processing rate and achieve the purpose of real-time detection.(2)The phishing website detection algorithm based on plagiarism is proposed as the second layer detection method.Firstly,features are analyzed and extracted,including extract keywords.Labels are recoded,in order to improve the matching efficiency,reduce complexity and save physical space.Finally,the plagiarism rate of web pages is calculated for fishing discrimination and the algorithm parallelization scheme is designed in the cloud environment to improve the detection efficiency.(3)The phishing website detection algorithm based on improved random forest algorithm is proposed as the third layer detection method.Association rules are extracted to improve the random forest algorithm that is used to predict the property of websites.The improved algorithm can detect a variety of types of phishing sites and the advantage of improved algorithm is validated by experimental comparison,at the same time the introduction of cloud computing parallel processing method to improve data processing ability and detection rate.
Keywords/Search Tags:Phishing website, logo detection, webpage plagiarism, random forest algorithm, Hadoop
PDF Full Text Request
Related items