Font Size: a A A

Application Study Of Big Data Clustering Based On The Complex Network

Posted on:2017-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2370330488987682Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Taking the network big data as the research objects, it finds that complex network is an important component of the network big data through the analysis of the characteristics of large data. Besides, there is a natural similarity between the discovery of community structure in the complex network and clustering algorithms. So it uses the complex network to study the clustering of large data network.It is found that the number of complex network nodes is so large-scale by analyzing the characteristics of the network data,and saving time is very important for the clustering of large data network, so the overall discovery algorithms are not suitable for clustering large data networks. Therefore, this paper studies the clustering of large data network from the view of local discovery. It proposes clustering algorithms of the big data based on local key nodes. To begin with, it introduces local key nodes, combined with the global key nodes discovery to propose a discovery method of local key nodes. Then, it uses the improved fitness formula outward expansion to get the final community taking the key nodes and neighbor nodes as the initial community. Then it is found that there are several key nodes within a community in large data by analyzing the characteristics of large data sets, therefore, outward expand the local key nodes according to the formula of the fitness degree may lead to other key nodes were excluded, and the initial community composed by key node and its neighbor nodes may make the nodes of adjacent communities joined in, which results in a decline in the quality of clustering. In order to solve these problems, a large data clustering algorithm based on local key community is proposed, which can improve the quality of clustering results by introducing the concept of maximum clique and further improving the fitness formula. First of all, the paper analyzes the characteristics of the maximum group, which concluded that the group is the most closely connected group of nodes. So this can be determined that the all nodes of the maximal clique are within a community, and the largest maximum clique with in a community is the largest group within the community, that is the core category and the maximal class of the whole community. Therefore, it can be found that the local key community can be found by the combination of the discovery method of local key nodes and the maximum clique. In this way, the data sets can be divided into two parts, that is the local key community and the common node. The original fitness formula can only be suitable for the expansion of a single node, but now a smaller key community need to be joined in the community, so the fitness function must be improved. Then, the paper expands the adaptive formula to get the final community using the largest local key community as the initial community. Finally, the algorithm is tested on real data sets to prove that the algorithm is feasible and it can reduce the time consumption. Besides, it proposes the parallel strategy of the corresponding part and the general algorithm which is validated on real data sets through analyzing the components of the algorithm. The results show that the proposed strategy can effectively reduce the time consumption without affecting the quality of the results, particularly evident on large data sets, which is proportional to the number of parallel threads. Therefore, the proposed parallel strategy is suitable for the clustering of large data networks.
Keywords/Search Tags:Big data, Cluster, Local, Fitness function, Parallel
PDF Full Text Request
Related items