Font Size: a A A

Application Of Penalized Logistic Regression Models In Text Classification

Posted on:2018-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y P HuangFull Text:PDF
GTID:2439330515452671Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the information in human society increases exponentially.Fast and accurately classification and recommendation of text information are in great demand.In order to solve the high dimension and sparse data problem faced by the text classification task,our paper mainly do the following research work.Firstly,we carry out a comprehensive introduction of technologies involved in the text classification process.Secondly,we summarize the theoretical development of the penalized logistic regression model,and discuss the feasibility of the penalized logistic regression model in solving the text classification problem from the perspective of literature review.In addition,we propose a new algorithm to combine the word vector theory and the penalized logistic regression model for text classification.Thirdly,we make comparisons of the penalized logistic regression models with traditional feature selection methods and traditional classifiers on experiment,and realizes the text classification algorithm we proposed.In the experimental analysis,the following results are obtained.(1)Compared with the traditional feature selection methods such as ?2 statistics,F-statistics and mutual information,the elastic-net logistic is more superior at both accuracy and sparsity.(2)The penalized logistic regression models are comparable with the support vector machine at classification performance,and outperform other traditional classifiers such as Naive Bayesian and decision tree under the framework of both vector space model and word vector.(3)Comparing with the LASSO and the elastic-net models,the model complexity of the Group LASSO and Sparse Group LASSO models are greatly reduced and the classification accuracy are improved.Last but not least,we crawl the mobile phone review data from Zhongguancun Online through Web crawler technology,and make an empirical study on online product reviews mining.First,we establish a sentiment analysis of product reviews based on text classification models.Second,we make a comprehensive study of extracting predictors for product review helpfulness,and build a quantitative model to measure the helpfulness of product reviews.Based on our models,we can classify and rank the product review scientifically.
Keywords/Search Tags:Text Classification, Penalized Logistic Regression, Product Review Mining
PDF Full Text Request
Related items