| With the rapid development of the Internet and the film industry, the combination of them is more and more closely, And the film recommendation service also came into being, for users in the mass film resources to make a reasonable choice to provide a reference.The current recommendation service in the mainstream of the industry mainly relies on the excavation of the relationship between different movies and the construction of the user portrait model.Recommendation system based on users and similar user behavior and characteristics, while reference to the relationship between the film, that is, the relationship between the film map, to provide users with personalized recommendation services. However, at present the relationship between the film to obtain the relationship between the film’s mainly reference to the label characteristics of the film itself, this method by the number of features and the characteristics of the label granularity of the impact of the relationship between the film map is not high confidence, reducing the personalized recommendation service accuracy.Network film critic includes the rich emotion and inclination of the viewer, also includes the correlation degree of different films on the semantic and emotional level. Using the network film critics as the starting point, we can extract the relational value of the film from the text,construct the film relational map based on semantic and emotional tendencies, then we can reduce the dependency of the feature label on mining inter-film relations, and calculate the relationship between films from a new angle, as an effective supplement to the original method, so as to better personalized recommendations and other follow-up services to provide reference and reference.In this paper, we use the network of film criticism for the study of the starting point, calculate the degree of similarity between movies based on pragraph2vec, build a relationship between the different film map. The main work is summarized as follows:First, this paper introduces the methods and process of obtaining the data of web movie reviews, including the seed link and the crawling of the web reviewing text, and how to deal with the anti - reptile strategy of the target website efficiently. We build a data-crawler system for film criticism based on Scrapy, so we can provide a reliable source of data.Second, the text preprocessing methods such as word segmentation,vocabulary building and Huffman tree are introduced, which lays a foundation for subsequent vector computation. We have made a lot of optimization and improvement of the process of text preprocessing,which laid the foundation for the subsequent vector calculation.Third, vector calculation section. The vector calculation includes the calculation of word vector and the calculation of the paragraph vector. In view of the characteristics and the particularity of the text, the calculation of word vector and paragraph vector is optimized accordingly, We use the model fusion method to calculate the word vector, at the same time,the thesis has introduced and improved the existing text vector calculation method and model to make it more efficient and reliable.Fourth, based on the vector calculation, we complete the film map system construction. In order to verify the feasibility of this model and method, we introduce the LDA (Latent Dirichlet Allocation) theme model to compute the vector of network critics, and construct the relational graph on the basis of LDA theme model. The experimental results show that the text vector model proposed in this paper is improved by 10%compared with the method based on LDA theme model. |