| As people are more and more closely connected with the Internet,when people enjoy the fast and convenient of the Internet,they also step into the hidden danger of Internet security.Phishing attack,as one of the most serious threats to network security,forges phishing websites by imitating legitimate websites and induces users to visit in order to steal user information.With the increasing improvement of detection technology,phishing attacks are also changing.How to effectively detect and identify phishing attacks is a research hotspot of network security.In order to better adapt to the changes of phishing attacks,from the perspective that phishing websites and legitimate websites have similar appearance,this thesis proposes structural similarity measurement detection method and content similarity measurement detection method,and implements a phishing detection system based on similarity measurement.Firstly,a structural similarity measurement detection method based on render tree is proposed.From the perspective of web page layout similarity,aiming at the problems of structural feature extraction,feature vector representation and interpretability in the current structural similarity measurement and detection methods,the page render tree is extracted as an effective web page structural feature.The n-gram algorithm and the improved TF-IDF algorithm with layout influence factors are used to generate highly discriminative feature vectors.Then the similarity between structural feature vectors is calculated by similarity measurement algorithm,and finally the detection and recognition of phishing are realized.Through many comparative experiments,it is proved that this method has a good detection effect on phishing attacks,and improves the precision,F1 score and recognition rate.Secondly,a content similarity measurement detection method based on mixed attention and margin loss is proposed.In order to detect phishing websites with evasive means,according to its characteristics of replacing text with pictures and confusing web page information,from the perspective of web page content similarity,aiming at the problems of poor feature discriminability,slow model convergence and poor model interpretability in the current content similarity measurement and detection methods,an improved nfnet model based on mixed attention mechanism and additive angular margin loss function is proposed,Accelerate the convergence of the model and enhance the discriminability of features.Then the similarity between feature vectors is calculated to realize phishing detection and recognition.Through many comparative experiments,it is proved that the detection effect of this method is better than other methods,and it performs well in precision,F1 score and recognition rate.Finally,based on the above research content,a three-stage phishing detection process is designed,and a phishing detection system based on similarity measurement is realized to detect phishing attacks comprehensively and quickly.Through comparative experiments,it shows that the detection effect of the three-stage detection method of the system is good,and the actual operation effect of the system shows that the system can run stably and effectively detect phishing attacks. |