Font Size: a A A

Research On College News Topics Discovery Based On LDA Topic Model

Posted on:2020-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X J YiFull Text:PDF
GTID:2417330578457269Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Network media has developed rapidly in the era of "Internet plus",and network news has gradually become the main source of information for people.As the main force among netizens,college students rely more on the Internet for acquisiting news.Therefore,how to dig out different topics from the social news browsed by college students,to grasp the college students' situation of social news topics attention and abnormal attention,and guide students to correctly treat social news events,which is of great significance to the start of educational work in university.This paper takes the social news that students browse as the research object.The LDA topic model capable of semantic mining is used to represent the news texts.After the in-depth study of the technologies involved in the topic discovery process at the present stage.In view of the problems existing in the college news topic discovery technology,the improvement scheme is proposed to make the results of college news topic discovery more accurate.The specific work is as follows:(1)Aiming at standard LDA text modeling has the problem that the topic distribution is tilted toward high-frequency words.this paper proposes a title-weighted LDA topic model based on the optimizational data preprocess process.Firstly,the preprocessing process of text modeling is optimized by integrating stop vocabulary and weight filtering to achieve the purpose of text feature dimension reduction,and to some extent reduce the distribution probability of invalid high frequency words in the topic.Secondly,because the news headline has a highly generalized effect on the news content,this feature is used to introduce the title weighting strategy in the LDA model,and the title-weight index table is established to improve the Gibbs sampling algorithm in the model training process.Finally,the experimental results show that the text modeling scheme proposed in this paper not only improves the efficiency of model training,but also improves the distribution probability of the title words in the topic,making the description of the topic more accurate,and the difference between topics are enhanced.Therefore,the representation of the text is more reasonable,and the accuracy of the topic discovery result can be improved to some extent.(2)In the stage of college news topic discovery,this paper proposes a double-layer Single-Pass algorithm,which improves the similarity calculation method to extract the type of news topic,it solves the problem that the traditional Single-Pass algorithm is sensitive to the document input order.Through comparison experiments,it is found that the improved text clustering algorithm has obvious improvement in recall rate,precision rate and F value,which makes the results of college news topics discovery more accurate.(3)Using the topic discovery method proposed in this paper to analyze the collected university news data,and get the topic types and attention that students are concerned about.From the results,the research on the college news topic discovery has a certain guiding effect on the development of college students' work.
Keywords/Search Tags:College news, LDA topic model, Gibbs sampling, Single-Pass clustering, Topic discovery
PDF Full Text Request
Related items