Font Size: a A A

Han Yue Bilingual News Topic Discovery Research

Posted on:2018-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HouFull Text:PDF
GTID:2358330515955926Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology,China and Vietnam and other regions in the political,economic,cultural and other aspects of the exchange is also more and more close.As the main carrier of the exchange of information between the two countries,timely and effective discovery of the two countries related news topics and news topics of development and evolution has become particularly important.Therefore,it is difficult to construct and analyze the current situation of the translation of the Chinese and Vietnamese bilinguals,how to effectively solve the problem of cross-language and the development of news topics in the process of bilingual topic discovery.Important elements to study:(1)This paper proposes a Chinese news topic discovery method which integrates the relationship of page elements.Firstly,the TF-IDF method based on word frequency statistics is used to calculate the weight of the document based on the word weight.The cosine similarity algorithm is used to calculate the similarity of the news page,and the initial similarity matrix of the news page is obtained.Then,the initial similarity matrix is corrected by using the correlation relation of the elements in different news documents as the semi-supervised constraint information.The algorithm is used to realize the effective clustering of the news document,and the news document cluster is extracted from the cluster,In order to achieve the construction of news topic model.Finally,the results of contrast experiments on the annotations of annotated news pages show that the near-neighbor propagation clustering method that combines the association of news elements has a better effect than the method of not adding constraint information.(2)This paper proposes a method of discovering Chinese and Vietnamese cross-language topics based on the similarity of comparable corpus words.Based on the news topic in(1),the word vector of bilingual words is trained by using the Chinese and Vietnamese poems.Based on the word vector,the similarity between the Chinese query word and the Vietnamese word is calculated.According to the similarity value Out of Vietnamese candidate extension words.Then,according to the similarity between the Chinese and Vietnamese bilingual words,the Chinese news topic is extended to the Vietnamese language,and the Vietnamese news text is retrieved.Then the clustering algorithm is used to cluster the news text to obtain the Chinese text Vietnamese various events.The comparison experiment shows that the method of translation of query expression with similar corpus has better effect than the traditional bilingual LDA method in cross-language topic analysis.(3)According to the above research results,the author designs and realizes the discovery system of Chinese and Vietnamese bilingual public opinion topics.It can use the system to understand the reports and topic details of a news topic in China and Southeast Asian countries,and provide a further study for the topic Experimental platform for the follow-up study of the evolution of Chinese-Vietnamese bilingual news topic provides the relevant resources.
Keywords/Search Tags:news elements, Hadoop, Chinese-Vietnamese comparable corpus, bilingual word similarity, Chinese-Vietnamese bilingual topic
PDF Full Text Request
Related items