| Social network is a combination of numbers of individuals, or organizations and the connections between them. Through the study of social network theory, we can try to discover the relationships hidden in these communications, and apply it to e-commerce, information recommendation and so on. E-mail, as an important part of social network, has become the powerful communication platform for cooperation and knowledge sharing. However, how to figure out the communities as well as their topics and core-persons from a great deal of dataset is a hard problem.In this thesis we propose a solution to discover the communities of email network, based on some features of social networks, such as centrality, small community and small world etc. In this solution we first build a weighted and directed graph of the email network, then figure out the core nodes by computing the density variation based on the greedy algorithm; secondly, divide the core nodes into groups, each of which represents a core graph, also as the initial community; at last, we expand the certain nodes to different communities according to the similarity of characteristics of communication behavior, and readjust the centrality of the communities. After analyzing the similarity between email links and web links, we tried to find out core-persons using PageRank algorithm and the results show that this method is more accurate than the traditional method based on statistics.How to evaluate and interpret the communities is another important issue. In this thesis, we applied some objective standards to evaluate communities, at the same time we also tried to find the topics of each community as the subjective evaluation.In this thesis we develop a community mining system, the results of experiments show that our solution could discover communities and core-persons more accurately, so we can estimate node's cooperation and community development based on the foregoing result. |