Font Size: a A A

Research On Mail Community Special Characters Find Algorithm

Posted on:2015-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X H LinFull Text:PDF
GTID:2268330428998811Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the coming of the information age, e-mail became widespread as a way oftransmitting information. E-mail network formed by the people’s communicationbehavior,which bears the information of the social relations of heavy e-mail users.Therefore, Social Network Analysis(SNA) has great potential significance to mininge-mail network of social relations.The aim of this paper is to mine special characters of e-mail network. There aretwo types of special characters referred in this article: spammers and key leaders.The Spammers Discovery Algorithm is improved and proposed on the basis ofSpam Community Mining Algorithm. By using the graph of empower the topology toconstruct the email communication network, it can better reflect the real situation ofinformation transmission of email network, according to the characteristics ofspammers, by the thought of first stripping then reintegration, and using the averageddensity function and Dijkstra algorithm(Disk Strlla algorithm) calculation for themiddle of centrality, spammers and other evaluation function can find the spam.Next, taken the idea of link analysis, can find the key leaders of the mail network.On the basis of a directed graph, firstly using PageRank algorithm, and according tothe node sending and receiving to calculate node important degree, then sort it, expandthe collection, calculate similarity then screen initial, and improve the discorery andelimination for unidirectional links node. By adding nodes bidirectional connectiondegree as the basis of eliminating one-way malicious nodes, filtered set of nodes usingas object of EHITS algorithm, and using the node PageRank value as the nodeimportant degree, using EHITS algorithm to calculate the authority and hub value ofnode, then the highest value of Authority Scores of node is the important leader we arelooking for. Finally, compared in the algorithms of the data set and the drgree centrality,betweenness centrality, HITS, PageRank, the confusion degree of definition as theevaluating indicator, evaluate the effectiveness and superiority of the algorithm.
Keywords/Search Tags:Spammers Discovery Algorithm, EHITS Algorithm, mail networks, SNA
PDF Full Text Request
Related items