Font Size: a A A

Research And Implementation Of Counterfeit Website Recognition Based On URL And HTML Source Code

Posted on:2021-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2518306572969559Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the birth and popularity of the Internet,people’s lifestyles have undergone tremendous changes.The Internet has brought convenience to people from various aspects such as clothing,food,housing,and transportation,but at the same time,it has also brought certain security risks.Internet users may also have the risk of information leakage while registering personal information on the website.Nowadays,some criminals use most of the Internet users’ unfamiliarity with the Internet and their lack of awareness of risks to spoof users’ clicks and browses by spoofing some legitimate websites,thereby defrauding users’ personal information and bank card passwords Carrying out some illegal activities has caused huge economic losses or other problems for users.Therefore,how to identify whether the website is a counterfeit illegal website through technical means,and to remind users when they click to reduce the possibility of user information leakage has important research significance and practical application value.This article mainly studies the identification of counterfeit websites from two aspects: how to identify whether a website is a counterfeit website and how to find a target website that is counterfeited by a counterfeit website.In terms of identifying counterfeit websites,we extracted features from multiple angles and used multiple classifiers to conduct classification experiments and comparative analysis,and designed and implemented a counterfeit website recognition system.Regarding the discovery of counterfeit target websites,the algorithm of association relationship mining is proposed to mine association relationships from multiple aspects and design and implementation of a counterfeit target website discovery system based on association relationships.The main research contents of this article are as follows:First of all,this article takes URL and HTML as the starting point,extracts a large number of features used to identify fake websites and legitimate websites from multiple angles,and describes the extraction methods of these features.At the same time,by analyzing the difference between the corresponding characteristics of counterfeit websites and legitimate websites,the rationality of the extracted features is illustrated.Secondly,on the basis of feature extraction,this article uses some feature selection methods to filter features.On the one hand,in order to reduce the feature dimension,improve the model training speed and the accuracy of the prediction results;on the other hand,in order to reduce the dependence of the extracted features on the data set.By analyzing the classification principles and applicable scenarios of multiple classifier models,this paper designs a new classifier model CNN-RF based on convolutional neural networks and random forests.Through experiment comparison with classifier models such as random forest,convolutional neural network,deep forest,and logistic regression,CNN-RF’s advantages in counterfeit website recognition are verified.Then,CNN-RF is used as the classifier model to conduct classification experiments,and analyze the influence of different numbers of samples and different features on the classification results.Finally,this paper proposes a counterfeit target website discovery algorithm and designs and implements a counterfeit target website discovery system.According to the characteristics of counterfeit websites to deceive users by counterfeiting legitimate websites,the association between counterfeit websites and suspected target websites is tapped from the aspects of links,keywords,website rankings,content,structure,and similarities in styles,etc.Relationship,find the target website that the counterfeit website is counterfeiting.Experiments show that the characteristics of the phishing website and the classifier model used in this paper can effectively identify whether the website is a phishing website.The proposed method to find the phishing target website based on the association relationship between the websites can find the phishing to a certain extent.The target website that the website is counterfeit.
Keywords/Search Tags:Counterfeit website, feature extraction, random forest, Convolutional Neural Network, relationship mining
PDF Full Text Request
Related items