| With the rapid development of Internet and computer technology advances, The phenomenon of information overload apper in the e-commerce. In order to deal with this situation as soon as possible, there comes into being recommend system,and the system that based on collaborative filter algorithm get a huge affirmation in all system.But there are some difficult problems, which the greatest impact is too sparse data problem.This thesis will be focused on the phenomenon of sparsity data in the collaborative filtering recommendation system, and put forward an optimization algorithm for the traditional program. First, in-depth study of traditional collaborative filtering algorithm at the scene of sparse data, the advantages and disadvantages of the similarity calculation existence, comparison and correction similarity calculation method, to get collaborative filtering algorithm which based on improved calculation methods. Then, reduce the user- item matrix’s dimensions according to the characteristics of sparse data. Through the establishment of scalable clustering method to get effective clutering of original data, and get the collaborative filtering algorithm that base on dimension reduction and clustering at the scene of sparse data, then apply the optimized algorithms recommended to the movie scene. The main contents include:(1) In-depth study of the problems of traditional method can not accurately express the similarity between user at the scene of sparse data, and put forward improve algorithms on this basis. In connection with the problems that traditional similarity calculation method generally do not consider the number of common itemd between users and do not taking the evaluation criteria for each user is inconsistency into account to improve the similarity calculation. The optimized collaborative algorithm can be more fully consider the defect of the traditional method.(2) To handling sparse data research an effective program. Apply principal compon ent analysis to reduce the dimension of matrix which is too sparse, greatly reduced the data sparsity. Research scalability clustering methods, useing clustering algorithm to effectively cluster the initial data, make all the users which have a high similarity into the same category, so we can sonstruct a better neighbor set, thus reducing the impact of sparse data on the traditional algorithm, and improving the problem that when selecting a neighbor problem the traditional collaborative filtering algorithm is too complicated.Finally,using the film dataset movielens to verfy the optimization algorithm of this thesis,the results show that in the movie recommendation scene,it can indeed improve the sparsity problem. |