Font Size: a A A

Research On Fuzzy Clustering Algorithm Based On Hadoop Platform

Posted on:2016-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W W ZhongFull Text:PDF
GTID:2308330473465458Subject:Data mining
Abstract/Summary:PDF Full Text Request
In the real world, the vast majority of the phenomena having no definite boundaries, are fuzzy or random, so the application of the method of fuzzy theory in the field of data mining has become the hotspots in present research. However, in the face of vast amounts of data, traditional data mining algorithms cannot meet the requirements of customers in the era of big data, so it has deep research value and broad application prospects to combine data mining algorithms with cloud computing platform which has powerful computation ability.Firstly, in the view of the traditional fuzzy clustering algorithms being not only easily affected by the initialization, but also easy to fall into local maxima in the iteration. This paper studied a kind of genetic algorithm combined with fuzzy C- means(GA-FCM) algorithm. The experiments show that the algorithm can effectively overcome the shortcomings of traditional algorithm which is sensitive to initialization, and can converge to the global optimal solution with higher probability.Secondly, this paper presents a new clustering algorithm of artificial immune theory(CSAFCM) combined with the famous clonal selection algorithm. The new algorithm can avoid premature convergence of GA. At the same time the clonal selection algorithm uses a set of search strategy, that is essentially parallel and with the randomness of the search direction, so it can obtain the global optimal solution of the problem more accurately, and has faster convergence speed than the genetic algorithm, which make it more suitable for clustering analysis of large data sets.Finally, this paper introduces the core architecture and operating mechanism of cloud computing and hadoop platform, analyzes the advantages of using cloud computing technology to realize data mining, in-depth understanding of the MapReduce programming model. Next, this paper presents how to implement new algorithm(CSA- FCM) using MapReduce programming model reasonably.Besides, the new algorithm can run more efficiently in the cloud platform, and through the MMTD evaluation criteria shows that the algorithm can calculate accuracy in the cloud platform.
Keywords/Search Tags:Big data, cloud computing, fuzzy clustering, genetic algorithm, artificial immune
PDF Full Text Request
Related items