Using background knowledge to improve text classification

Posted on:2003-09-30

Degree:Ph.D

Type:Dissertation

University:Rutgers The State University of New Jersey - New Brunswick

Candidate:Zelikovitz, Sarah

Full Text:PDF

GTID:1468390011979132

Subject:Computer Science

Abstract/Summary:

Automatic text categorizers use a corpus of labeled textual strings or documents to assign the correct label to previously unseen strings or documents. Often the given set of labeled examples, or “training set”, is insufficient to solve this problem. Our approach to this problem has been to incorporate readily available information into the learning process to allow for the creation of more accurate classifiers. We term this additional information “background knowledge.”; We provide a framework for the incorporation of background knowledge into three distinct text classification learners. In the first approach we show that background knowledge can be used as a set of unlabeled examples in a generative model for text classification. Using the methodology of other researchers that treat the classes of unlabeled examples as missing values, we show that although this background knowledge may be of a different form and type than the training and test sets, it can still be quite useful. Secondly, we view the text classification task as one of information integration using WHIRL, a tool that combines database functionalities with techniques from the information-retrieval literature. We treat the labeled data, test set and background knowledge as three separate databases and use the background knowledge as a bridge to connect elements from the training set to the test set. In this way, training examples are related to a test example in the context of the background knowledge. Lastly, we use Latent Semantic Indexing in conjunction with background knowledge. In this case background knowledge is used with the labeled examples to create a new space in which the training and test examples are redescribed. This allows the system to incorporate information from the background knowledge in the similarity comparisons between training and test examples.

Keywords/Search Tags:

Background knowledge, Text classification, Examples, Training, Labeled, Information

Related items

1	A Study On Learning From Positive And Unlabeled Examples
2	Background Learning Based Iterative Framework For Text Classification
3	The Application Of Improved Labeled LDA Model In The Classification Of Technical Video Text
4	A Text Classification Method Based On Deep Learning And Labeled-LDA
5	A Study On Few-Shot Imbalanced Short Text Classification
6	Design And Implementaion Of Finance News Classification System Based On Labeled-LDA
7	Text Classification Based On Improved Labeled-LDA
8	Research On Text Classification For Proposals And Construction Of Domain Knowledge Graph
9	Research On Adversarial Examples For Chinese Text Classification Models
10	Short Text Classification Method Combining Statistical Information And Conceptual Information Of Knowledge Base