Font Size: a A A

Improvement And Application Of Bayesian Logistic Regression Text Classification Model

Posted on:2019-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:T T XuFull Text:PDF
GTID:2370330548970221Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Since the appearance of text mining,Text classification technology has been a hot research topic in data mining,many of researchers have been improving text mining techniques.There appeared a lot of methods such as Text classification based on text similarity,machine learning,rules and knowledge,these methods have their own advantages and disadvantages.Based on the shortcomings of text classification methods,this paper focuses on improving semantic deep mining and classification accuracy to realize massive text information processing.This paper present an improved text classification model combining association rules and Bayesian logistic regression model,among of them,association rules uses text semantic rules to measure word segmentation and word frequency correlation,which makes the model have a larger scope of application.Bayesian logistic regression has a good robustness and could effectively avoid overfitting.In summary,in this paper we present a Bayesian logistic regression model with association rules,we first uses association rules to dig deeply the text semantic information,and obtains the similarity parameter,so that further uses the similarity parameter to construct the Gaussian priori to join to the Bayesian logistic regression model.Besides,model estimation is estimated using a sampling method that introduces the Polya-Gamma auxiliary variable.and compared with the estimation method of function approximation.As an application,improved model use Fudan University Chinese News Corpus.Which contains about 2,815 articles,a total of 10 news topics.Data volumes and categories are more appropriate for text categorization.In order to verify the classification efficiency of this model,we compared with that of association rule model,Bayesian logistic regression model and Bayesian logistic regression model with Polya-Gamma auxiliary variable.The results of example shows that our improved model proposed in this paper can improve the efficiency of text categorization.It has practical application value in the field of spam processing and News topic classification.
Keywords/Search Tags:Text Categorization, Association rules, Bayesian logic regression, Polya-Gamma auxiliary variable
PDF Full Text Request
Related items