Exploiting linguistic knowledge to infer properties of neologisms

Posted on:2011-07-10

Degree:Ph.D

Type:Thesis

University:University of Toronto (Canada)

Candidate:Cook, C. Paul

Full Text:PDF

GTID:2445390002969425

Subject:Computer Science

Abstract/Summary:

Neologisms, or newly-coined words, pose problems for natural language processing (NLP) systems. Due to the recency of their coinage, neologisms are typically not listed in computational lexicons---dictionary-like resources that many NLP applications depend on. Therefore when a neologism is encountered in a text being processed, the performance of an NLP system will likely suffer due to the missing word-level information. Identifying and documenting the usage of neologisms is also a challenge in lexicography, the making of dictionaries. The traditional approach to these tasks has been to manually read a lot of text. However, due to the vast quantities of text being produced nowadays, particularly in electronic media such as blogs, it is no longer possible to manually analyze it all in search of neologisms. Methods for automatically identifying and inferring syntactic and semantic properties of neologisms would therefore address problems encountered in both natural language processing and lexicography. Because neologisms are typically infrequent due to their recent addition to the language, approaches to automatically learning word-level information relying on statistical distributional information are in many cases inappropriate. Moreover, neologisms occur in many domains and genres, and therefore approaches relying on domain-specific resources are also inappropriate. The hypothesis of this thesis is that knowledge about etymology---including word formation processes and types of semantic change---can be exploited for the acquisition of aspects of the syntax and semantics of neologisms. Evidence supporting this hypothesis is found in three case studies: lexical blends (e.g., webisode a blend of web and episode), text messaging forms (e.g., any1 for anyone), and ameliorations and pejorations (e.g., the use of sick to mean 'excellent', an amelioration). Moreover, this thesis presents the first computational work on lexical blends and ameliorations and pejorations, and the first unsupervised approach to text message normalization.

Keywords/Search Tags:

Neologisms, NLP, Text, Due

Related items

1	A Study On Chinese-English Translation Of Media Neologisms And Hot Words From The Prospect Of Eco-translatology
2	A Study On Chinese Neologisms From The Perspective Of Emergentism
3	A Study On Neologisms With Reference To Persons Of2006-2010
4	A Research In The2011Chinese Neologisms Under The Perspective Of Memetics
5	A Comparative Study Of English And Chinese Neologisms In The 21^st Century
6	Study On The Acquisitive Situation Of Chinese Neologisms For Foreign Learners
7	Neologisms Of Current English And Chinese In The Pluralistic Society
8	A Study On Acceptance And Life Span Of Chinese Culturally-Loaded Neologisms
9	A Contrastive Analysis Of The Expansion Of English And Chinese Neologisms
10	On Lexicographical Translation Of Neologisms In ECLDs