| From Amazon's commodity recommendation to Netflix's movie push,books,music,movies and other fields cannot do without personalized recommendation system.There are hundreds of thousands of new books published in China every year,and the total amount of information of books per year far exceeds the amount of information needed by individuals.The problem of information overload in the field of books is becoming more and more prominent,so personalized recommendation is becoming more and more important to the book industry.Similarity computing method is one of the important parts of personalized recommendation algorithm,which directly affects the performance of recommendation algorithm.The traditional similarity computing method mainly USES the common score items between users or between items to calculate the similarity.When the data is relatively sparse,the recommendation effect of this method is not satisfactory.At the same time,the single recommendation algorithm is often not effective in the actual scene.In view of the above problems,this paper conducts the following research work:Firstly,the collaborative filtering recommendation algorithm(RJCF)based on RJaccard coefficient is proposed to solve the problem that the similarity between users cannot be found out quickly and accurately in the context of relatively sparse data.The algorithm USES RJaccard coefficient to calculate the similarity through the global scoring items between users or items,and finds out the similarity between users quickly and accurately in the scene with relatively sparse data.Secondly,in order to make up for the deficiency of RJCF algorithm's ability to mine user implied information,a new mixed recommendation model is proposed.This model integrates the RJCF algorithm and linear regression model to make score prediction.Among them,machine learning is used to establish linear regression models respectively from the perspective of users and items.The simulation of this hybrid model was carried out on the data set book-crossing.The results show that the hybrid recommendation algorithm is better than the single one.Finally,in order to solve the problem of data storage and calculation required for modeling in the case of a large amount of data,a big data platform based on Hadoop was built,and the simulation verification of mixed recommendation model was completed on the platform.The setup process involves installing CDH on a Linux server and configuring its core configuration files(core-sit.xml,hdfs-sit.xml,mapred-sit.xml and yarn-sit.xml).After the completion of the platform construction,using the HDFS,MapReduce,Mahout and other distributed components in the cluster,complete all the steps of the book recommendation system from data storage,data cleaning to recommendation algorithm modeling.To sum up,this paper USES the latest similarity calculation method combined with the linear regression model to propose a hybrid recommendation method applied in the field of books,and conducts experimental verification in the Hadoop big data platform,achieving good results. |