| With the rapid development of the Internet,the era of big data is coming quietly.In the Internet application software,full of a variety of video resources software,video content is also blooming.Especially in the movie industry,there are a lot of movie recommendation software and movie resources.Although a wide variety of movie resources in movie websites bring users a rich visual feast,they also bring users difficulty in choice,resulting in users "information overload".How to quickly solve the problem of "information overload" and recommend simple and personalized movie resource information to users has become a research hotspot in recent years.This paper designs and implements a movie recommendation system based on Flink,which uses a new generation of streaming computing engine Flink and a real-time recommendation algorithm integrating time weight and reward and punishment factors to recommend movies and videos more in line with users’ interests.On the one hand,it can help users save time to find video resources.On the other hand,Attracting user traffic can bring potential business value to an enterprise.The traditional recommendation system built using Hadoop platform,in the face of today’s massive data and complex algorithm model,the processing speed drops significantly,and can not achieve low delay and efficient data recommendation for users.Secondly,the traditional recommendation algorithm based on collaborative filtering cannot perceive the problem of users’ interest drift in real time,which leads to unsatisfactory recommendation results.For the recommendation engine,Spark uses execution engine technologies such as memory computing and directed acyclic graph.Compared with Hadoop,the speed of reading data from disk is more than 10 times that of Hadoop,and the speed of reading data from memory is more than 100 times that of Hadoop.The Spark computing engine can efficiently process massive data.However,when dealing with a large number of Streaming data,Spark adopts micro-batch processing architecture,which needs to be improved in terms of real-time performance.Compared with Spark Streaming,Flink,a new generation of streaming computing engine,deals with real-time data with obvious improvement in performance,making it more handy to deal with streaming data.For recommendation algorithm,hybrid recommendation algorithm can better make up for the shortcomings of single recommendation algorithm.The adoption of hybrid recommendation algorithm will significantly improve the recommendation result.Therefore,the main work of this paper is as follows:(1)In terms of computing engine,the movie recommendation system platform is divided into offline recommendation and real-time recommendation.The offline recommendation physical platform uses Spark computing engine to build a movie recommendation system with big data components such as Flume and Kafka,providing guarantee for big data processing and analysis of movie resources.Flink computing engine is used to construct real-time recommendation service and process the streaming data generated by the movie recommendation system.(2)In terms of recommendation algorithm,the movie recommendation system is divided into offline recommendation algorithm and real-time recommendation algorithm.By analyzing common recommendation algorithms in the industry,matrix decomposition algorithm is adopted to solve the sparse problem of movie score matrix in offline recommendation,and Spark’s alternating least square method is selected and heapsort algorithm is fused to achieve an improved collaborative filtering recommendation algorithm.Furthermore,Top-N movie data is generated for users through continuous parameter adjustment and training of appropriate hidden meaning recommendation model.Make offline movie recommendations.The recommendation result of offline recommendation algorithm after the user updates a movie score is basically the same as the recommendation result generated when the user does not update,so it does not have real-time recommendation ability.Therefore,this paper introduces Ebbinghaus forgetting curve and reward and punishment factor to build real-time recommendation algorithm,and carries out real-time Top-N recommendation for users by adjusting the time weight function.(3)Finally,a distributed cluster is built on three servers for comparison experiment.In the off-line recommendation part of the movie recommendation system,the Spark computing engine is adopted by the improved off-line recommendation ALS algorithm based on heap sorting.Under the condition that the RMSE index is basically unchanged,the running speed of the algorithm model is significantly improved.In addition,offline recommendation algorithm is introduced into heap sorting to solve the problem that ALS algorithm in MLlib will carry out Cartesian product in model prediction,which consumes a lot of memory and takes a long time to execute.In the real-time recommendation part of the movie recommendation system,the real-time recommendation algorithm introduces the Ebbinghaus forgetting curve,integrates the time weight and the reward and punishment factors to dynamically perceive the problem of user interest drift,and adopts the Flink calculation engine.The experimental results show that the realtime recommendation algorithm has improved significantly in the accuracy and recall rate.Recommendations are more in line with users’ interests.Flink,a new generation of streaming computing engine,is compared with Spark.The experimental results show that Flink is faster than Spark when the amount of data keeps increasing. |