Font Size: a A A

Community Detection On Heterogeneous Information Network

Posted on:2018-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2348330518495695Subject:Computer technology
Abstract/Summary:
With the rapid development of the Internet, social network analysis has received extensive research. The contemporary social network analysis is mainly based on the homogeneous information network which contains the objects of the same type. However, the real world is more complex, the network contains multiple types of objects, the relations between the objects are also more diverse. Therefore, it is more accurate to model such networks using heterogeneous information networks.Heterogeneous information networks consist of multi-typed interconnected objects. More and more data mining tasks (such as:similarity measure, community detection,etc.) are studied in heterogeneous information networks. Similarity measure is to evaluate the similarity between objects, community detection could find the community structure of networks. They are both the basis of many data mining tasks.Meta path is a path consisting of a sequence of relations defined between different object types. Different meta-paths represent different semantic information, so most data mining tasks in heterogeneous information networks are based on meta path. Most meta path-based similarity measure methods in heterogeneous information networks do not consider values on links. However, networks with values on links are ubiquitous, such as recommender systems, bibliographic networks, and so on. If we ignore values on links, we may get wrong results. In addition,most community detection tasks are carried out in homogeneous information networks, and it is more difficult to find community structures in heterogeneous information networks that contain more complex network structures and richer semantic information.Faced with the above problems, we firstly propose a similarity measure method in heterogeneous information networks which takes values on links into account. This method takes values on links into account by using weighted heterogeneous information networks and weighted meta path. Traditional similarity measure in heterogeneous information networks can be applied in weighted meta path by spliting and merging. Experiments in recommender system, relevance search and clustering analysis prove that the similarity measure on weighted meta path is superior than that on traditional meta path.Secondly, we propose a community detection method HCD for heterogeneous information networks. This algorithm consists of two parts:a community detection algorithm based on single meta path HCD_sgl and a community detection algorithm HCD_all to combine results of all meta paths. HCD_sgl improves the traditional label propagation algorithm. The method firstly reduces the number of initial labels by selecting the seed nodes and detecting the community structure in the seed node network.Then, community belonging factor is introduced to make the algorithm applicable to overlapping community detection. HCD_all combines the results of all meta paths to get the community ’structure in heterogeneous information networks. Experiments on real dataset and artificial dataset prove that our method can detect community structures in heterogeneous information networks effectively.
Keywords/Search Tags:heterogeneous information network, similarity measure, community detection, meta path
Related items