Application Study Of Big Data Clustering Based On The Complex Network

Posted on:2017-05-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:2370330488987682

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Taking the network big data as the research objects, it finds that complex network is an important component of the network big data through the analysis of the characteristics of large data. Besides, there is a natural similarity between the discovery of community structure in the complex network and clustering algorithms. So it uses the complex network to study the clustering of large data network.It is found that the number of complex network nodes is so large-scale by analyzing the characteristics of the network data,and saving time is very important for the clustering of large data network, so the overall discovery algorithms are not suitable for clustering large data networks. Therefore, this paper studies the clustering of large data network from the view of local discovery. It proposes clustering algorithms of the big data based on local key nodes. To begin with, it introduces local key nodes, combined with the global key nodes discovery to propose a discovery method of local key nodes. Then, it uses the improved fitness formula outward expansion to get the final community taking the key nodes and neighbor nodes as the initial community. Then it is found that there are several key nodes within a community in large data by analyzing the characteristics of large data sets, therefore, outward expand the local key nodes according to the formula of the fitness degree may lead to other key nodes were excluded, and the initial community composed by key node and its neighbor nodes may make the nodes of adjacent communities joined in, which results in a decline in the quality of clustering. In order to solve these problems, a large data clustering algorithm based on local key community is proposed, which can improve the quality of clustering results by introducing the concept of maximum clique and further improving the fitness formula. First of all, the paper analyzes the characteristics of the maximum group, which concluded that the group is the most closely connected group of nodes. So this can be determined that the all nodes of the maximal clique are within a community, and the largest maximum clique with in a community is the largest group within the community, that is the core category and the maximal class of the whole community. Therefore, it can be found that the local key community can be found by the combination of the discovery method of local key nodes and the maximum clique. In this way, the data sets can be divided into two parts, that is the local key community and the common node. The original fitness formula can only be suitable for the expansion of a single node, but now a smaller key community need to be joined in the community, so the fitness function must be improved. Then, the paper expands the adaptive formula to get the final community using the largest local key community as the initial community. Finally, the algorithm is tested on real data sets to prove that the algorithm is feasible and it can reduce the time consumption. Besides, it proposes the parallel strategy of the corresponding part and the general algorithm which is validated on real data sets through analyzing the components of the algorithm. The results show that the proposed strategy can effectively reduce the time consumption without affecting the quality of the results, particularly evident on large data sets, which is proportional to the number of parallel threads. Therefore, the proposed parallel strategy is suitable for the clustering of large data networks.

Keywords/Search Tags:

Big data, Cluster, Local, Fitness function, Parallel

PDF Full Text Request

Related items

1	Calculate The Parallel Method Of Moments In Electromagnetics And The Realization Of The Cluster Environment
2	PC Cluster-Based Parallel Processing And Visualization For Massive Mine Spatial Data
3	Pc Cluster-based Parallel Processing And Visualization For Massive Mine Spatial Data
4	Research On Intelligent Decision-making Methods In Parallel Emergency Management Of A Chemical Cluster
5	Radiation Calculation And Analysis Of FY3 In The Big Data Environment
6	Massive Data Many Task Parallel Data Framework For GWAS
7	Study On Execution Time Prediction For Parallel Geocomputation In Multi-Core Cluster Environment
8	The Parallel Algorithm For Multibody System Dynamics Based On The Cluster Platform
9	Models And Algorithms Of Local Partition For Network Data
10	Price Time Series And Its Local Extremum Analysis