Font Size: a A A

Research On Collaborative Filtering Algorithms Based On Clustering And Supervised Learning Models

Posted on:2019-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:H P LiFull Text:PDF
GTID:2429330566483536Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapidly developing internet is becoming an important platform for information transfer and commodity trading,an integral part of most people's lives.The expansion of information on the internet has brought rich information to users,however,it has also challenged the ability and energy of users to search for information.Recommender systems,important approaches for filtering information,have been widely applied in various fields on the internet.Collaborative filtering algorithms,one kind of the most successfully used recommendation techniques in recommender systems,have made great progress both in theory and practice.However,with the rapid increase of users and items in recommender systems,traditional memory-based collaborative filtering algorithms are faced with scalability problem,with excessive consumption of computational resources.How to maintain the computational resources below acceptable level and guarantee the recommendation efficiency and quality is currently a hot issue in the recommendation field.For scalability problem,this paper introduces clustering and supervised learning techniques in collaborative filtering recommendation and proposes three kind of collaborative filtering recommendation algorithms.The main work includes the following three parts:Traditional memory-based collaborative filtering algorithms represent a user(item)by one row(column)of the user-item rating matrix.In a recommender system with millions of users and items,high-dimensional user and item vectors will reduce recommendation efficiency.This paper proposes DRU and DRI algorithms for dimensionality reductions of users and items respectively.DRU(DRI)clusters users(items)by bisect k-means clustering algorithm and calculates the membership degrees of users(items)to user(item)clusters.Each user(item)is then represented by the corresponding membership degree vector.Since the dimensionality of a membership degree vector is usually much lower than that of a rating vector,the computation of the similarities between users or items of a memory-based collaborative filtering algorithm is greatly reduced,which improves online recommendation efficiency.In addition,this paper also proposes an algorithm DRUI for integrating the predictions of DRU and DRI.Experimental results show that the proposed algorithms are much more efficient than the traditional memory-based collaborative filtering algorithms(UCF and ICF)in term of the online recommendation.In addition,although DRU and DRI are inferior to UCF and ICF in term of the precision of rating prediction,they are superior to the latters after being integrated by DRUI.Traditional memory-based collaborative filtering algorithms need to search for nearest users(or items)of target users(or items)when predicting ratings.When the number of users or items continuously increases,online recommendation efficiency of these recommendation algorithms will be challenged,so this paper introduces random forest model which can be trained offline and proposes CRF algorithm.CRF first generates user and item membership degree vectors by clustering(these processes are the same as that of DRU and DRI).These vectors are then used,with the user-item rating matrix,to construct the training dataset of supervised learning models.And then,the random forest model is trained offline for the online recommendation.The experimental results show that CRF is much more efficient than the memory-based collaborative filtering algorithms in term of online recommendation.In addition,rating prediction precision and classification accuracy of CRF are superior in most cases.For the scalability problem,this paper introduces neural networks(one kind of incremental learning models)and proposes CFBP_R regression model,CFBP_C classification model and CFBP_SW weights sharing model.The CFBP_R and CFBP_C models consider the rating prediction issue as a regression task and a classification task respectively.CFBP_SW is an improved model based on the CFBP_C which greatly reduces the number of parameters by introducing weights sharing mechanism.The structure of each input sample of the models is a(user ID,item ID,rating)triple,which is the most common storage format for rating data of recommender systems.That is,almost no preprocessing process(such as conversion to a rating matrix)is needed for raw rating data before using CFBP_R,CFBP_C,and CFBP_SW.The data in the disk can be read in batches when training the model,which reduces the demand for memory.Additional rating data can be used to optimize the model parameters incrementally without retraining the model.In addition,a new encoding(linear encoding)is proposed for classification model CFBP_C,which need to encode the ratings.Two versions of CFBP_C,CFBP_C(one-hot)and CFBP_C(linear),are derived by using commonly-used one-hot encoding and linear encoding respectively.Experimental results show that the proposed algorithms are superior to the traditional memory-based collaborative filtering algorithms(UCF and ICF)in term of rating prediction precision,classification accuracy and online recommendation efficiency.The proposed algorithms are less sensitive to data sparseness and can relief data sparsity problem to some extent.The proposed linear encoding method can significantly improve the precision of rating prediction.The weights sharing model CFBP_SW is comparable to CFBP_C in term of precision of rating prediction even if its parameters are much less than that of CFBP_C.
Keywords/Search Tags:Recommender System, Collaborative Filtering, Clustering Technique, Random Forest, Artificial Neural Network
PDF Full Text Request
Related items