Combining machine learning and hierarchical structures for text categorization

Posted on:2002-05-22

Degree:Ph.D

Type:Dissertation

University:The University of Iowa

Candidate:Ruiz Ruiz, Miguel Enrique

Full Text:PDF

GTID:1468390011991751

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Text categorization is the process of algorithmically analyzing an electronic document to assign a set of categories (or index terms) that succinctly describe the content of the document. This assignment can be used for classification, filtering, or information retrieval purposes. Machine learning methods such as decision trees, inductive learning, neural networks, support vector machines, linear classifiers, k-nearest neighbor, and Bayesian learning have been applied to solve this problem but most of these applications ignore the hierarchical structure of the underling classification vocabulary.; This dissertation focuses on the use of hierarchical classification structures, such as the UMLS Metathesaurus or the Yahoo! hierarchy of topics, to build and train machine learning algorithms for text categorization. For this purpose we use a variation of the Hierarchical Mixtures of Experts (HME) model adapted for text categorization. We evaluate the HME model using neural networks, and linear classifier as the nodes of the hierarchy. We explore in detail the use of different feature and training set selection methods. Experimental results are reported using a large collection of MEDLINE documents (OHSUMED collection) to assess the effectiveness of the HME model for in text categorization.

Keywords/Search Tags:

Text categorization, Machine learning, Hierarchical, HME

PDF Full Text Request

Related items

1	Research Of Hierarchical Text Categorization System Based On VSM And Rule Matching
2	A Study On Text Categorization Based On Machine Learning
3	Automatic Categorization Of Chinese Journal Papers Based On Machine Learning
4	Text Categorization On Machine Learning Algorithm
5	Research On The Method Of Chinese Text Categorization Based On Machine Learning
6	The Research On Text Categorization Technology Based On Hierarchical Categorization And Ensemble Learning
7	Text Categorization Algorithm Based On Machine Learning
8	Research And Implement Of Chinese Text Categorization Algorithm Based On SVM
9	Fast Text Categorization Research
10	Research On Text Categorization Method Oriented To Content Security