Font Size: a A A

Research On Hybrid Recommendation Algorithm For Sparse Data

Posted on:2022-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhaoFull Text:PDF
GTID:2518306575463274Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the popularization of the Internet.The informatization characteristics of society have gradually become prominent.While the Internet brings convenience to users,it also brings certain troubles to people.The main manifestation is the flood of information on the Internet,and people cannot quickly and effectively find what they need in a short period of time,which is the socalled "information overload." problem.Collaborative filtering algorithm is one of the most mature and widely used recommendation technology in the field of recommendation.Similarity measurement and user preference prediction are two key points of collaborative filtering algorithm.Although a variety of similarity models and prediction methods have been proposed,there are still some problems in them.First,the traditional user similarity model only uses common rating items to calculate the user similarity,which makes the sparsity problem of the dataset more serious.Most item similarity methods also have similar defects.Second,only user rating is used as a data source to measure user similarity,while some attributes of the item itself are ignored,resulting in poor interpretability of similarity calculation results.Third,although high-quality neighbors are found for the target users(target items)by using the similarity model,only a small number of neighbors can participate in the prediction process,resulting in poor reliability of the prediction results.The main work and research contents are as follows:Firstly,the similarity methods adapted to sparse environment are proposed.Most user similarity models require two users to rate the same item when calculating user similarity.Similarly,most item similarity models require two items to be rated by the same user when calculating item similarity,which intensifies the sparsity of the dataset.In this paper,Hellinger distance and Manhattan distance are introduced into the similarity calculation of users and items respectively,and the similarity between users and items is measured from the perspective of probability distribution,which breaks the constraint of traditional similarity method and alleviates the sparseness of data set to a certain extent.Secondly,a multi-dimensional user similarity model is designed.The traditional collaborative filtering algorithm only relies on the user rating to calculate the user similarity,which has the defects of single dimension,insufficient comprehensiveness of measurement and so on.In this paper,a user-tag matrix is constructed,and based on this,a user similarity model combining user rating and item tag is proposed.Compared with the traditional user similarity model.The user similarity model proposed in this paper is more comprehensive and interpretable.In addition,since the number of item tag is far less than the number of item,the similarity model proposed in this paper is also highly efficient.Thirdly,two improved prediction methods are proposed.The traditional user-based rating prediction model requires neighbor users to have rated the item,but due to the sparseness of the dataset,it is difficult to ensure that all neighbor users have rated the target item.Therefore,we propose a new user-based rating prediction model.In this model,for neighbor users who havn’t rate the target item,the item most similar to the target item that the user has rated is used to replace the target item.Similarly,the traditional item-based rating prediction model requires target users have rated the neighbor items.Due to the sparsity of the dataset,most neighbor items also don’t meet this constraint condition.Therefore,we designed a new item-based rating prediction model,which adopted a new strategy to find the nearest neighbor items,that is,to find the neighbor items of the target item in the item list of the target user have rated,so as to ensure that the target user have rated every item in the neighbor set.The experimental results show that the two prediction models proposed in this paper not only significantly improve the neighbor utilization r,but also effectively improve the recommendation accuracy.
Keywords/Search Tags:Hellinger distance, Manhattan distance, data sparsity, similarity model, neighbor utilization, collaborative filtering, recommendation algorithm
PDF Full Text Request
Related items