Chinese Keyword Extraction By Term Positions

Posted on:2004-08-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:WANG Jiayue

Full Text:PDF

GTID:1115360092985737

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Keywords are the best content descriptors, more effective than other index terms for information retrieval (IR) systems, especially when the rapidly growing information sources are putting retrieval precision into highlight. Statistics based IR and keyword extraction (KE) systems view documents as bags of unordered words, treating all index terms as equally important, without regard to their syntactic position. This paper tests the intuition that the syntactic position of Chinese nominal phrases is helpful for keyword extraction and compares the results with those of KE that is based on text position-a more widely used dimension.Web pages can be treated much in the same way as normal text. Our investigation of some web search engines shows that their conceptions of relevance are different. Based on a detailed discussion of relevance, it is argued that there has not been a good link between the operability of system-oriented relevance and the rich achievements of user-oriented relevance studies. It is decided that topical relevance ought to be the attitude to be taken by web search engines and to be assumed in the present research. The approach to topic extraction based on human intuition is believed to be a promising direction worthy of efforts, because by extracting topic words, the subset of documents that really matches the user's information need can be clearly determined, unlike the "standard" retrieval systems that only decide which documents are possibly relevant. Given that such human intuitions about relevancy can be well described, topically relevant results can be successfully retrieved and the outcome of the IRsystem will be more satisfactory.We conducted a corpus-based study of (a) text position-keywordhood and (b) syntactic position-keywordhood relation. Attention is focused on Base NPs, which are manually annotated from a collection of technical documents, with their text position (title, introduction/conclusion) and syntactic position (subject, verb complement etc.) marked according to a pre-designed scheme. The statistic results of the first experiment showed a high correlation between the Base NTs' syntactic position and their potential of being keywords. Subsequent experiments confirmed the belief that text position was helpful for KE, but syntactic position appeared not, which led to the conclusion that text position was more valuable than syntactic position with regard to KE.

Keywords/Search Tags:

Extraction

PDF Full Text Request

Related items

1	Chinese Keyword Extraction By Term Positions
2	A Report On The Comparison Of The Efficiency Of Automatic Bilingual Term Extraction Tools
3	Emotion Congruence And Its Effects On Memory Encoding And Extraction
4	Research On English Named Entity Extraction
5	Information Extraction Of Patients’ Self-Description Aiming At Online Diagnosis
6	English-Chinese Comparisons: Influence Of Conceptual Features On Word Meaning Extraction
7	A Comparative Analysis Of Approaches To Automatic Collocation Extraction
8	Extraction And Generation Of Sketches Of Painted Cultural Relics Based On Deep Learning
9	Extraction Of Historical Relations Based On Multiple Attention Mechanisms
10	A Study On Chinese Terms Extraction And Their Application