Font Size: a A A

Research On Text Categorization Based On Classification Algorithm

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:M W L YangFull Text:PDF
GTID:2359330566956249Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the popularity of the internet and e-commerce applications extensively,people not only enjoy shopping on Internet,but also caught in the dilemma of information overload.How to obtain the useful information quickly and efficiently from large text is getting more and more important.As a new technology method in exploring useful information in a lot of information,data mining has drawn great interest.Text classification is an important part of text mining,at the same time,classification algorithm is the core of text classification or a text classifer.In this paper,the application of classification algorithms in Chinese and English texts is studied.Na?veBayes algorithm and support vector machine(SVM)algorithm are two commonly used classification algorithm.These two algorithms show a prominent advantage in many aspects compared with other classification algorithms.In this paper,the two algorithms are selected as the main research contents,means having a high practical value.I studied the two algorithms through understanding the basic theory.Then through the actual data,two kinds of algorithms are applied in practice,and the application of the two algorithms is studied in depth.The data in the paper includes Chinese text,English text,large sample,small sample etc.In this paper,through literature,we summarize,existing characteristics of the text classifer and the steps of the text classification,discussing the technical method in Chinese text classification,such as the method of word segmentation and so on.Through introducing the basic theory of na?ve bias and SVM got a clear understanding of the classification algorithm.Finally,through a variety of data experiments,the two algorithms are tested on the practical application,and are summarized.In this paper,the main tool is R,and as a very popular statistical software,it gives a stable and convenient operation to study in the process of achievement,also laid a good technical foundation..
Keywords/Search Tags:Text Classification, Statistical learning theory, Na?ve Bayes, SVM, Data Experiment
PDF Full Text Request
Related items