With the popularity of the Internet and the rapid development of the Internet of Things,data has entered an automatic generation stage,and humans have entered the era of big data.How to efficiently process and use these data has become a challenge.The emergence of recommendation system has effectively relieved the trouble caused by information overload.The recommendation algorithm is the core algorithm of the recommendation system.Collaborative filtering(CF)recommendation algorithm is the most widely used one among recommendation algorithms.However,the recommendation algorithm in the stand-alone environment requires a long processing time in the face of a sudden increase in the amount of data,and cannot meet the real-time nature of the recommendation.The collaborative filtering recommendation algorithm under the distributed platform solves this problem.But,the collaborative filtering recommendation algorithm still has problems such as data sparseness,cold start,and scalability.Therefore,this paper systematically studies the recommendation process and recommendation principle of collaborative filtering recommendation algorithm based on RDD(Resilient Distributed DataSet)programming model on Spark distributed platform,and proposes two optimization algorithms for different problems in the algorithm.The main research contents are as follows:1.Aiming at the problem of data sparsity and user cold start in recommendation,this paper proposes a multi-factor collaborative filtering recommendation algorithm.First,user characteristics are added to the input data set,and a clustering algorithm is used to classify them.Thereafter,the similarity between users is calculated within the class,which not only eases the problem of data sparsity and cold start of the user,but also reduces the amount of algorithm calculation.Secondly,the similarity algorithm is improved by adding the difference factor of user ratings,and the similarity of users is measured from a more macro perspective,which improves the effectiveness of similarity.Then,the concept of associated items was proposed in the generation stage of the recommendation list,and a recommendation list was generated for the user from two aspects,the predicted score and the relevance to the predicted item score.The algorithm is designed and implemented under the RDD programming model.Experimental results show that the algorithm eases data sparsity and improves the scalability of the algorithm.The recommendation precision and F1 have been improved to a certain extent.2.Aiming at the subjective problems of user feedback in recommendation and the shortcomings in the application of similarity algorithm,the user feedback fuzzy top-n recommendation model is proposed.First,the concept of fuzzy set is applied to thepreprocessing of user feedback data to solve the subjectivity problem of user ratings,so that the data can more accurately reflect the actual preferences of users.Secondly,the concept of confidence is proposed,combined with two traditional similarity algorithms,an improved CJ-sim similarity algorithm is proposed,and the scoring prediction algorithm is simplified to amplify the effect of similarity on prediction results.Finally,the recommendation model is designed and implemented based on the RDD programming framework.Experimental results show that the model is suitable for Item-based CF and User-based CF,which alleviates the subjective problem of feedback data and the shortcomings of similarity algorithm,and improves the precision of recommendation. |