Font Size: a A A

Community Detection And Association Analysis Based On Email Behavior

Posted on:2018-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:W X LiFull Text:PDF
GTID:2518305966950389Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently,Internet developed rapidly,everyone relies on it.The network data growing very fast.Those data imply so much knowledge,it will bring many benefit on business and national security and so on.Also,big data related technology such as the machine learning and distributed computing platform drives the knowledge discovery.In order to get knowledge from user network behavior,capturing the network packet at key routers is an efficient way,all network behavior is recorded in those packets,those information is saved in the Pcap file.Pcap files are parsed to SMTP packets according to the networking protocol on the Hadoop.There are packets resolving and TCP reassembly in this process.Then,converting the email behaviors in SMTP packets to a data structure,directed graph.The knowledge in social network of email behavior can be mined.The GraphX is used for detecting knowledge in the graph-structured data.In the process of email social network analysis,Label Propagation algorithm and Fast Unfolding algorithm are used to detect community in social network of email behavior.The experiment of community detection showed that Fast Unfolding have more accurate result and more stable.Then,for the purpose of identiting the importance of each user in community,PageRank algorithm is used.At last,when it comes to many feature of user,the community feature can be evaluated with the help of importance of user.Beside the topological characteristics of social network of email behavior,the association of user email behaviors is a key feature.The user A will send emails to C and D in x hours after received an email from B.This is a typical behavior pattern of A.An algorithm is designed for pattern discovering.PrefixSpan is used in this algorithm.The experiment shows that community features and user behavior patterns can be discovered efficiently.Those algorithms can scale for big data and distributed environment.
Keywords/Search Tags:big data, GraphX, community detection, community persona, association analysis of user email behavior
PDF Full Text Request
Related items