Font Size: a A A

Study On The Application Of Random Forests In Text Classification

Posted on:2020-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhangFull Text:PDF
GTID:2428330599951735Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the popularization of the Internet and the rapid development of computer technology,we have created a large number of information data on the network.Entering the era of big data,the information presents explosive growth.For massive and complex information data,it is necessary to organize and comb effectively and then mine its association.Text classification is one of the key parts to solve these problems.Stochastic forest is a typical combination classifier.By introducing randomness to construct the set of decision tree,it has high classification accuracy,overcomes the problem of over-fitting,and has good tolerance to noise and abnormal value,and so on.Stochastic forest has been widely used and has achieved good results in the field of text classification.Due to the shortcomings of stochastic forest algorithm,some aspects need to be improved.In this paper,feature extraction and feature selection of text classification are introduced by systematizing the process of text classification.Among them,text preprocessing is mainly about word segmentation,decommissioning words,word-stem extraction,feature extraction and feature selection.In the part of feature extraction and feature selection,some methods of feature selection are emphatically introduced.And in this paper,several classifier models in the field of text classification,such as naive Bayes,support vector machine,K-nearest neighbor algorithm,are briefly introduced,and the performance evaluation index of classifier is introduced.Then this paper briefly introduces the related theory of stochastic forest,and applies the text classification and stochastic forest algorithm theory to the examples,and uses the financial news information of CNBC website as the data source to carry on the text classification work.
Keywords/Search Tags:text classification, random forest, feature selection
PDF Full Text Request
Related items