Font Size: a A A

Research On Interests Of Sina Weibo Users Based On LDA Topic Model

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ShiFull Text:PDF
GTID:2427330602983966Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As the development of China's mobile Internet has matured and stabilized,various social platforms have paid more attention to the diversification of content,actively seek innovations and breakthroughs in the model in order to seize each other's market share.Although Sina Weibo continues to occupy the leading position in the mobile social industry,the fierce competition in the industry has also brought great challenges to the development of Sina Weibo.The core competitiveness of Sina Weibo lies in the communicative impact brought by the leading groups of users and high-quality original contents,so this requires the platform to control users' need more accurately in the current environment.Focusing on this issue,this article has conducted relevant research on the interest and preferences of users on Sina Weibo.LDA(Latent Dirichlet Allocation)probability topic model is a three-layer probability structure model proposed by Blei et al.It can be trained to obtain the probability distribution of each document on the topic space and the probability distribution of each topic on the word space.It has the characteristics of unsupervised learning,and does not need to give some examples of languages with known annotations.LDA can be directly modeled on an unknown corpus.In many researches on Sina Weibo users' interest preferences,the modeling and training of the Weibo documents created by the user as a unit is directly obtained in this way,and the distribution of the subject terms of the documents is obtained as a description of the user's interest preference information.This article adds another inference method.Firstly,use a known corpus to supervise training and obtain an optimal model.Then use this trained optimal model to semantically mine and analyze the documents created by users as units in other unknown corpora.In this paper,this known corpus is constructed of the classification labels of popular features on Sina Weibo,which ensures that the corpus has unity in terms of the characteristics of words used before and after model inference.In addition,this article combines the experience of using Sina Weibo platform and the development concept of Sina Weibo in recent years,and puts forward the hypothesis that data liked by user in history should be added to expand the user data documents in empirical research.And through the questionnaire survey and empirical research,the assumption is proved to be reasonable in theory and effective in practice.Regarding the method of collecting data,because of the limited access mechanism of the Sina Weibo platform,this article designed and developed a crawler system for Sina Weibo under the Python programming language in order to collect Sina Weibo data for different research needs.
Keywords/Search Tags:Users on Sina Weibo, Interest mining and analysis, Topic model, LDA, Crawler system
PDF Full Text Request
Related items