Font Size: a A A

Research On Phishing Email Identification Method

Posted on:2024-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XiaoFull Text:PDF
GTID:2558307067973159Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the coming of the information age,E-mail has become one of the important means of communication for modern people.However,at the same time,it also brings a series of email security problems.More and more attackers use email as a carrier to trick users into providing sensitive information or performing malicious operations,resulting in huge financial losses and data leakage risks.Email security events happen frequently.It is of great significance to study email security to improve the level of network security protection and protect users’ privacy.In addition,with the continuous development of artificial intelligence,deep learning and other technologies,deep learning has achieved great success in many fields,but it is still less used in phishing email recognition.Moreover,phishing email recognition method based on deep learning provides better performance and higher recognition efficiency compared with other methods,which is a new trend in recent years.Therefore,this thesis is devoted to the research of phishing email recognition method based on deep learning and the development of phishing email recognition system based on deep learning.The main contributions of this thesis are as follows:(1)Aiming at the problem that the feature representation of phishing email recognition method based on deep learning is not systematic,this thesis proposes a multi-level and multifeature phishing email feature analysis method.This thesis analyzes the text features of phishing emails from four aspects,namely,the character layer character,the logical layer semantic feature,the cognitive layer emotion feature and the character layer URL(Uniform Resource Locator)feature,and proposes the appropriate feature representation method.In terms of word characteristics,an improved TF-IDF(Term Frequency-inverse Document Frequency)method was used to filter feature words.In terms of semantic features,a Word2 Vec word vector model is constructed based on the mail corpus,which can represent the semantic information of Chinese and English words simultaneously.In terms of emotional characteristics,aiming at the deficiency of emotion corpus in phishing email field,we construct emotion text corpus including fear,curiosity and urgency of phishing email.In terms of URL features,aiming at the particularity of URL syntax,N-gram word segmentation and character-level encoding are used to obtain the feature representation.Finally,two new features are proposed:attachment name correlation feature and text correlation coefficient feature.(2)In view of the poor interpretability and robustness of current phishing email recognition models,this thesis proposes a phishing email recognition model based on multi-channel Bi LSTM(Bidirectional Long Short-Term Memory Network)+Attention.This model can input multi-layer features extracted from emails into multi-channel networks for processing and analysis,and introduce Bi LSTM to learn contextual dependencies of text features.At the same time,the model introduces the adaptive Dropout regularization method to improve the model generalization ability.Then,the scaling dot product attention mechanism is introduced to enhance the model’s attention,so that it can identify phishing emails more accurately.Finally,an improved binary cross entropy Loss function,Focal Loss,was introduced to optimize the model for the unbalanced mail data set.The experimental results show that each index of the proposed model is superior to the existing basic model,and the accuracy of the proposed model reaches 98.87% in the mixed data set of Chinese and English.(3)Finally,based on the above two studies,this thesis discusses the application value of phishing email recognition system,and designs and implements a phishing email recognition system based on deep learning.Users can upload email data in a specified format or email EML format file,and the system will process the input data.And output the recognition results and feature attention weight visualization diagram to help users more clearly understand the basis of system recognition results.
Keywords/Search Tags:Phishing Email Identification, Deep Learning, Emotion Analysis, Attention Mechanism
PDF Full Text Request
Related items