Bayesian Text Segmentation for Terminology Extraction

Posted on:2013-08-22

Degree:M.S

Type:Thesis

University:University of California, Irvine

Candidate:Koilada, Nagendra

Full Text:PDF

GTID:2455390008468483

Subject:Artificial Intelligence

Abstract/Summary:

Automatically extracting terminology and index terms from scientific literature is useful for a variety of digital library, indexing and search applications. This task is non-trivial, complicated by domain-specific terminology and a steady introduction of new terminology. Correctly identifying nested terminology is both interesting and challenging. Commonly-used approaches rely on the knowledge of document structure and supervised learning techniques to retrieve terminology. We present a new approach called Dirichlet Process Segmentation (DP-Segmentation) to identify key terms. This method is a Bayesian technique that is based on a probabilistic generative model for production of multi-word segments. DP-Segmentation outperforms previous methods for solving this problem of extracting nested multi-word terminology. In addition, the method has the advantage of being very robust. It is language independent, and does not require parsing or part of speech tagging. As such, DP-Segmentation has potential applications beyond extraction of index terms, such as segmenting Chinese text.

Keywords/Search Tags:

Terminology, Terms

Related items

1	Research On Key Terms Of Terminology Science
2	C-E Translation Of Scientific Terms In The Operating Manual Of ZP Series Polishing Machine
3	A Study On The Determinization Of Russian Dramatic Terminology From The Perspective Of Terminology
4	Research On Key Terms Of Cognitive Terminology
5	A Study On English-Chinese Terminology Database Construction Of Latin Dance Movement Terms
6	A Report On E-C Translation Of The Terms In 1000 Best Wine Secrets
7	Unplanned terminology development: A synchronic and diachronic study on economic terms in Turkish newspapers
8	Bayesian Text Segmentation for Terminology Extraction
9	Reassessment On The English Translation Standard Of TCM Terminology From The Perspective Of Modern Terminology
10	Linguistic Research Of Calligraphy Terms In Tang Dynasty