Font Size: a A A

Research And Application Of Flink-based Distributed Recommendation System

Posted on:2023-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhengFull Text:PDF
GTID:2568307055459634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet and web technology,users are enjoying increasingly convenient web services.But at the same time,the problem of "information overload" is also quickly exposed to people,and the ability to effectively filter information to improve efficiency is particularly important,and recommendation systems can well meet people’s needs.In the face of the huge and growing volume of data,traditional recommendation systems face problems such as low computational efficiency,poor real-time recommendations and single recommendation methods.The need for a recommendation system with distributed and parallelised computing power is becoming more and more urgent.To address these problems,this thesis uses Apache Flink,a new generation of streaming computing engine,as the computing platform for a variety of recommendation services,and combines Hadoop,Hive,Flume,Redis,Zoo Keeper,Kafka and other big data open source technologies to build a distributed recommendation system.Firstly,in the recommendation algorithm,the real-time recommendation algorithm is improved and optimised by incorporating a time decay function in the calculation of the recommendation priority,responding to changes in user preferences over time,taking into account the impact of negative and very low ratings in one rating,incorporating recent user ratings,generating multiple alternative item lists,and adding a time decay function between the lists to generate the final list.In the similarity recommendation,the similarity between films is calculated using the genre information of the films,incorporating a TF-IDF weighting algorithm to adjust the weights of popular genres.Offline recommendations were chosen to use a collaborative filtering algorithm based on Alternating Least Squares(ALS),using Alink,a generic algorithm platform based on Flink,to improve the computational efficiency of the offline recommendation algorithm in distributed scenarios.The results of numerical experiments show that the use of implicit feature vectors to calculate similarity has significantly improved the recommendation performance of the improved real-time recommendation algorithm compared to the feature attribute based approach.The improved real-time recommendation algorithm has improved accuracy,recall and normalised discounted cumulative gain NDCG compared to the original algorithm,with better performance in accuracy and recall for a time decay factor of 0.4,and better performance in normalised discounted cumulative gain NDCG for a time decay factor of 0.5.Secondly,the overall architecture of the distributed recommendation system is designed,using Movie Lens open source data to build the movie recommendation system,which includes a storage layer,data processing layer,application layer and display layer.The storage layer uses Hadoop as the core for distributed storage;the data processing layer uses Flink cluster as the computation engine for different recommendation services;the application layer contains offline recommendation service,real-time recommendation service,statistical recommendation service and similar recommendation service,and the corresponding recommendation list will be stored in My SQL;the display layer contains both front-end and back-end parts,and uses Angular JS technology The back-end business system basically completes the complete service logic at the Java EE level and is built using Spring.The final result is a hybrid movie recommendation system with multiple recommendation services complementing each other.
Keywords/Search Tags:recommender system, real-time recommendations, time decay function, Apache Flink, Hadoop
PDF Full Text Request
Related items