Font Size: a A A

Research For Topic Tracking Algorithm Based On Generalized Linear Model

Posted on:2018-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:C K PiaoFull Text:PDF
GTID:2428330569478820Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The key research content of network public opinion analysis is news topic tracking.It is a process for correlating subsequent news reports and a known topic,which is an effective way for network public opinion analysis.One of the principle methods for tracking news topic is text category.Generalized Linear Model(GLM),a probability model of fixed parameter,is a widely used category method capable of processing non-linear and non-constant variance data without changing the natural metric of the data;however,considering the dynamic feature of news topic development,the GLM is not a perfect model used for tracking news topic because of defects like long training time for and fixation of parameters within algorithm lifetime of the model.This paper,on the basis of the analyzing news topic development and evolution features,modifies the GLM and provides a non-parametric learning model adapted for the dynamic feature of news topic development,thus solving the problem that topic tracking model with fixed parameters is unadaptable to the dynamic feature of news topic development and evolution.The key contents of this paper are proposed here:1.Analyzing and concluding development and evolution features of news topics and correlating them with news ontology to provide a text pre-processing method suitable for news data features;by combining contrastive analysis of operating principle and applicable circumstances of LDA and principal components method with experiments,a conclusion can be obtained:in news data set,the data processed by principal components method with dimensionality reduction,has a mutual independence among features,which is in consistency with the applicable circumstances of Non-Parameter Generalized Linear Model(Np-GLM).2.With respect to the problem that traditional feature weighting algorithm did not fully present category information in feature terms,modifying the chi-squared statistics on the basis of the feature weighting algorithm,of which researching and studying are based on a vector space modal;providing a method for perform the feature weighting algorithm on the basis of a category distinguish ability based chi-squared statistics,the algorithm can more accurately extract feature words which are more distinctive for news.3.With respect to the shortcoming that the GLM cannot describe the dynamic development of news topics,testifying that the contingent probability of~ηis equal in the data set with feature independence according to the feature independent principle of a Vector Space Modal by using Bayesian method to analyze the natural parameter~ηin the GLM.Use of the property can weaken the inner product assumption for the natural parameter~ηin the GLM,thus increasing the generalization ability and fitting capability of the model to different data sets;modifying the GLM of traditional fixed parameters into a Np-GLM applicable for news topic dynamic development by using a modified model modified through non-parametric estimation solution;and obtaining the conclusion provided in the paper that applicable circumstances of the Np-GLM is the data set which in relatively low in feature correlation by combining theory analysis and experimental verification.In the end of this paper,the author carries on experimental verification on the algorithm provided by the paper.Through experimenting on the UCI standard data set,TDT data set and Internet news data set,both the category accuracy and F1 value are increased,thus verifying the effectiveness of the algorithm provided in the paper.
Keywords/Search Tags:Topic Tracking, Linear Model, Machine Learning, Non-parametric Estimation, Feature Weight
PDF Full Text Request
Related items