Font Size: a A A

Application Of PCA Dimensionality Reduction Method Based On Latent Variables In Text Classification Problems

Posted on:2019-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:J K LvFull Text:PDF
GTID:2438330572454084Subject:Statistics
Abstract/Summary:PDF Full Text Request
Text categorization is an effective method to deal with large amount of text in-formation.In the past few decades,key technologies in the field of text categorization have developed significantly,however,since the traditional text representation is of high dimensionality and high sparsity,there’s still plenty of room to improve in text categorization with the arrival of the era of big data.This paper aims at the high-dimensional and sparse features of text classifica-tion and proposes a method of principal component analysis based on latent repre-sentation which has improved the performance of principal component analysis.we suppose that the choice of feature words in generating texts is determined by the la-tent variables subject to normal distribution in the method of principal component analysis based on latent representation and estimate the true value of the feature value with mathematical expectation.Experiments show that this method can ef-fectively reduce the feature dimension and achieve better classification results by increased the weight of the feature word when it appears in the text.
Keywords/Search Tags:Text categorization, Feature dimension reduction, Principal component analysis, Latent representation
PDF Full Text Request
Related items