| Semi-supervised clustering is an effective integration of semi-supervised learning and cluster analysis,it uses the given prior information(e.g.,class labels,pairwise constraints,etc.)to guide the clustering process to improve the performance of data analysis and processing.At present,semi-supervised clustering has been widely used in medical,biological,and financial fields as well as complex network analysis.Semi-supervised clustering analysis of complex networks can discover communities in complex networks,which helps users understand the network structure and discover valuable information in them.However,as complex networks have a huge number of nodes and complex relationships among them,existing methods still suffer from inaccurate representation of node correlations,easy generation of small clusters consisting of one or two few nodes,and insufficient use of supervised information.To address the above problems,this paper conducts an in-depth study on the semi-supervised clustering method for complex networks,and the main research results are as follows:(1)A complex network simplification method based on multi-scale similarity is given.This method first defines the multi-scale similarity of nodes based on structure similarity and attribute similarity,in which structure similarity reflects the direct and indirect similarities between nodes by the shortest path and the number of nodes on the shortest path.Secondly,an Importance Score is constructed based on the multi-scale similarity to assess the importance of each node.Then,the Importance Score of each node and its membership degree to other nodes are used to achieve the simplification of the complex network.Finally,the effectiveness of this complex network simplification method is verified with the help of cluster analysis experiments.The experimental results show that the method proposed in this paper can intuitively and effectively reveal the community structure in the complex network,and can effectively improve the performance of the clustering algorithm compared with the methods that only use structure information and do not perform complex network simplification.(2)A multi-scale constrained semi-supervised clustering method for complex networks is given.This method first uses the above simplification method to obtain the simplified network.Secondly,to achieve multi-scale constrain information fusion,a constraint transformation-based approach converts instance-level constraints(label information,pairwise constraints)as well as clustering-level constraints into pairwise constraints.Then,the complex network is partitioned to obtain clusters with a reasonable nodes number by considering the Importance Scores of nodes and multi-scale constrains information.Thus,the probability of generating small clusters is reduced.The experimental results demonstrate that the method proposed in this paper can effectively solve the problem of generating multiple small clusters,and the multi-scale constraint fusion used in this paper makes the clustering performance better compared with the unsupervised methods or the methods using only one kind of supervised information.(3)A prototype system for semi-supervised clustering of complex networks based on multi-scale similarity is designed and implemented.The prototype system includes functional modules such as data import,complex network simplification,semi-supervised clustering,and result saving.The system operation results show the effectiveness of the system and provide a new way for semi-supervised clustering analysis of complex networks. |