Font Size: a A A

Chroma Clustering Analysis Of Film Poster Based On Hadoop

Posted on:2019-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:P K ZhuFull Text:PDF
GTID:2428330593451709Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,with the rapid development of Internet technology,the exponential growth of the amount of information provides users with many conveniences,but the ensuing problem is that in face of massive data,users often do not know what they really want in many cases.In this context,the recommendation algorithm came into being.The recommendation algorithm can help users quickly find the information that matches their expectations.On the other hand,it can also use the recommended algorithm to recommend interesting information to users.In addition,with the rise of machine learning and cloud computing of big data,the traditional recommendation methods based on movie genre or name or cast can no longer meet the needs of the people.Color,as an important feature of the image,becomes one of the new breakthrough in recommendation algorithms.However,as the complexity of the algorithm increases and the data set to be processed expands,the computational cost of a single machine can not meet the requirements of data processing of this scale.As a result,the entire federation algorithm runs slowly and its scalability is not satisfactory.Therefore,this paper uses Hadoop parallel computing to solve the bottleneck of data processing.This paper focuses on improving the accuracy of the clustering algorithm in extracting the dominant color of the poster,the influence of color factors on the accuracy of the film data by use collaborative filtering recommendation algorithm,and the time consumption of the algorithm.By moving the traditional density thinking from selecting the initial cluster center to the clustering result optimization process,to reduced density-calculated size of the data set,the time cost of the density calculation is relatively saved,and the clustering result is guaranteed not to be like Selecting initial points based on density leaves the ideal high-density area as the average iteration,thus ensuring computational efficiency.In order to further improve the computational efficiency of the algorithm,this paper disengagement from the traditional stand-alone computing,use Hadoop parallel computing framework disperses the data set into data blocks and then distributed to the working data nodes using MapReduce programming model for parallel computing.Experimental results show that the improved clustering algorithm proposed in this paper can effectively shorten the computation time and compared to traditional density-based clustering,the idea of density is effectively implemented.And use of Hadoop technology to further shorten the overall operation of the algorithm.Through further experiments,the collaborative filtering recommendation algorithm with color information weights take advantage of the color information generated by the clustering,which proves that the color information can improve the recommendation accuracy to a certain extent.
Keywords/Search Tags:Clustering Algorithm, K-means, Recommendation System, Density-based, Hadoop, Collaborative Filtering Recommendation, Project Features
PDF Full Text Request
Related items