Font Size: a A A

Combining machine learning and hierarchical structures for text categorization

Posted on:2002-05-22Degree:Ph.DType:Dissertation
University:The University of IowaCandidate:Ruiz Ruiz, Miguel EnriqueFull Text:PDF
GTID:1468390011991751Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Text categorization is the process of algorithmically analyzing an electronic document to assign a set of categories (or index terms) that succinctly describe the content of the document. This assignment can be used for classification, filtering, or information retrieval purposes. Machine learning methods such as decision trees, inductive learning, neural networks, support vector machines, linear classifiers, k-nearest neighbor, and Bayesian learning have been applied to solve this problem but most of these applications ignore the hierarchical structure of the underling classification vocabulary.; This dissertation focuses on the use of hierarchical classification structures, such as the UMLS Metathesaurus or the Yahoo! hierarchy of topics, to build and train machine learning algorithms for text categorization. For this purpose we use a variation of the Hierarchical Mixtures of Experts (HME) model adapted for text categorization. We evaluate the HME model using neural networks, and linear classifier as the nodes of the hierarchy. We explore in detail the use of different feature and training set selection methods. Experimental results are reported using a large collection of MEDLINE documents (OHSUMED collection) to assess the effectiveness of the HME model for in text categorization.
Keywords/Search Tags:Text categorization, Machine learning, Hierarchical, HME
PDF Full Text Request
Related items