Research And Application Of FCM Algorithms Based On Spark

Posted on:2020-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:J L Feng

Full Text:PDF

GTID:2428330602954330

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Due to the continuous innovation of technology,the continuous evolution of business models,and the increasingly prominent demand for large-scale data processing,how to effectively analyze and process large-scale data sets to extract value has become a topic of concern in the early 21st century.Therefore,how to effectively analyze and process large data,how to improve and extend classical algorithms to serve large data analysis,these problems are particularly important in the context of the era of large data.This paper is an improvement in this direction.Among many fuzzy clustering algorithms,the Fuzzy C-Means(FCM)algorithm is the most widely used one.The method of determining the attribute category of sample points is to obtain the membership degree of each sample point to all the class centers by optimizing the objective function,so as to cluster the sample data.This solution makes FCM algorithm get better clustering results than other fuzzy clustering algorithms even for data samples which are difficult to cluster.The research scheme of this paper is mainly based on theoretical basis and practical experiments.The common single-machine environment and Spark environment are compared from the aspects of application characteristics and models.The performance differences between the two architectures in iterative learning tasks are theoretically analyzed and compared,and the conclusion that Spark has more advantages in iterative performance is drawn.Then,the parallelization of the fuzzy c-means algorithm based on Spark platform is discussed,and the algorithm is improved by utilizing the special functions of Spark platform.The robustness of the algorithm after parallel computing is also improved to a great extent.Aiming at the problem that the clustering ability of the algorithm is defective on the non-linear data,partitioning method and feature weighting method are used to make the non-linear data clustering effectively.Based on the FCM algorithm,the Canopy algorithm is fused,which can solve the initialization problems in the algorithm,such as the initialization of clustering centers and the initialization of distance matrix.The efficiency and performance of the improved FCM algorithm have been greatly improved through the above,and the optimized algorithm is named SCWGIFP-FCM.In order to prove the validity of SCWGIFP-FCM algorithm,this paper takes Anuran data set,Gesture Phase data set,3D_spatial_network data set and MoCap Hand Postures data set in UCI data set as test data,compares their running results with traditional FCM algorithm,and uses PC index as clustering quality evaluation criterion,and proves the effectiveness of the optimized algorithm in experiments.Sex and availability.Based on the quality and efficiency detection of the algorithm,the optimized algorithm is applied to airline customer data mining to solve practical problems.

Keywords/Search Tags:

Fuzzy C-means Clustering, Spark platform, Distributed calculation

PDF Full Text Request

Related items

1	Research On Spark Oriented Fuzzy C-means Clustering Algorithm
2	Research And Realization Of Clustering Algorithm Based On Spark Platform
3	Parallelizing K-means-based Clustering On Spark
4	The Parallelization And Optimization Of K-means Algorithm Based On Spark
5	Optimization And Implementation Of Clustering Algorithms Based On Spark Platform
6	The Application Of Fuzzy C-means Clustering In The Stock Investment
7	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
8	Research And Implementation Of Hybrid Recommendation Algorithm Based On Spark Platform
9	Optimized Design And Implementation Of K-means Algorithm Based On Big Data Spark Platform
10	Optimization And Application Of K-means Clustering Algorithm Based On Spark Framework