Font Size: a A A

A Research Of Email Classification Integrated With User Attribute

Posted on:2017-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ShenFull Text:PDF
GTID:2348330503989811Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Email has become the indispensable way of communication in people's life, but the prevalence of spam not only occupy network resources, but also harm the public security. There is still high rate of misjudgment of normal email in current email classification techniques. While user's interest information in social network provides a breach for email classification techniques. How to make full use of email's rich attributes and user relationship in social network to reduce the misjudgment rate of normal email, at the meantime to solve the performance problem of traditional database in the store and management of complex attribute of social network provides a new perspective for email classification techniques.The paper proposes an email classification algorithm with user attributes of social network. The algorithm will extract user's interest information and closeness between users of social network. And use digital quantitative to denote the relationship between users, so it can break the limit of considering direct friends only, it can calculate the relationship between any two users in social network. This method take use of the characteristic of that the higher the closeness between users is, the lower probability they send spam, this characteristic can further enhance the accuracy of email classification algorithm. Take email subject into consideration can reduce the misjudgment rate of normal email according to the characteristic that no one reply to spam. Furthermore, it can improve the efficiency of the existing email classification algorithm according to the behavior of user delete the spam that wrongly classified as normal email and the behavior of user restore the normal email that wrongly classified as spam. At last, the paper use the graph-based metadata management to store and manage the metadata extracted from email. So it can improve the performance of email classification algorithm.According to the experiment result, under the same condition of training set that Microsoft published, the accuracy of the algorithm the paper proposed can reach 97.9%, and is improved by 9% and 5.8% respectively compared with the na?ve Bayes algorithm and SOAP(Social network aided Personalized and effective spam filter) algorithm. And the misjudgment rate of normal email can reach 1.3%, and is reduced by 15% and 8.7% respectively compared with the Bayes and SOAP.
Keywords/Search Tags:email classification, social network, user attribute, email metadata management
PDF Full Text Request
Related items