Font Size: a A A

Automatic noun phrase extraction from full Chinese text

Posted on:1998-10-26Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (People's Republic of China)Candidate:Li, WenjieFull Text:PDF
GTID:2465390014977876Subject:Language
Abstract/Summary:
In this thesis, a new statistics-based partial parser CNPext for extraction of maximal-length noun phrase in Chinese is presented. Given a Chinese run text as the input, the CNPext system performs the following: (1) noun phrase boundary determination; and (2) ambiguities resolution for relative clause and prepositional phrase modifiers. The noun phrase extraction module consisted of two stages: it first finds all boundary candidates, and then pairs the opening and ending candidates to form the final noun phrase.; Our system is superior to other noun phrase extraction systems, as it can resolve the structural ambiguities, a problem faced by many natural language processing systems. Others simply fail to do so as they cannot handle ambiguities incurred by relative clause and prepositional phrase modifiers. However, our experiments showed that merely statistics-based approaches with part-of-speech tags are not adequate for the purpose; semantic information at a higher level is needed for this. Our proposed algorithm used the semantic class relation between a verb-noun (preposition-noun) pair derived from the standard Chinese thesaurus, to work out which phrase structure is more semantically acceptable.; Our work is the first comprehensive attempt in automatic Chinese noun phrase extraction. It not only proposes an effective way to automatically extract noun phrases from large running texts but also gives an impetus to the other work in similar areas, e.g. verb phrase extraction.; Exploring effective methods for a complete noun phrase extraction system in the Chinese world is a challenging exercise. We hope this project has provided some insight, if not the complete solutions, to the problems and enables the development of advanced, practical Chinese information processing systems soon to handle the ever growing volume of information.
Keywords/Search Tags:Noun phrase, Chinese
Related items