| Dramatic growth of interrelated data has been occured with the increasing integration of diverse fields with Internet.It is essentially important to recommend the personalized data to the customers who are interested in among huge amount of data.Although collaborative filtering recommendation algorithm has been widely applied in various fields,due to the bottleneck of single-machine iteration capability,under the environment of huge amounts of data,data sparseness and scalability of problems will be more prominent,which seriously affect the accuracy of the Slope One-Bi recommended algorithm.Spark platform can greatly improve the recommendation efficiency by the memory advantage to iterate the recommendation algorithm.With these objectives,we are aiming for improving of the recommendation algorithm and implementing Spark platform in parallel.The problems of low recommendation accuracy,slow iteration speed and high computational complexity of the original Slope One-Bi algorithm based on the Spark platform and related technology of Big data have been analyzed.The following studies were conducted:1.Canopy-k-medoids clustering algorithm was proposed on the big data platform in parallel.Canopy algorithm was firstly used to traverse the data set to obtain the number of corresponding clusters and the global center point.Then,the k-medoids algorithm is used to calculate the distance to each center point for partitioning,which can effectively improve the clustering effect.Finally,UCI data set is used to test the performance,and the acceleration ratio and expansion ratio are improved to a certain extent.Moreover,Compared with other three clustering algorithms,the clustering effect is the best.2.Clustering algorithm combining Canopy and k-medoids brought users with high degree of similarity together.Then,the nearest neighbor was searched dynamically in the clustering based on whether the similarity between users is greater or not.And Slope One-Bi algorithm is used for recommendation and prediction.Finally,parallelization is implemented on the big data Spark platform.In conclusion,based on the deficiency of Slope One algorithm and the bottleneck of stand-alone iteration to optimize.Dynamic k-nearest neighbor and Canopy-k-medoids clustering are added to improve the recommendation performance and reduce the MAE value. |