Font Size: a A A

A corpus-based delimitation of new words: Cross-segment comparison and morphological productivity

Posted on:2005-08-13Degree:Ph.DType:Dissertation
University:City University of New YorkCandidate:Nishimoto, EijiFull Text:PDF
GTID:1455390008494233Subject:Language
Abstract/Summary:
The dissertation explores methods of identifying new words in a large corpus of texts, the British National Corpus (BNC) of 100 million English words, and of assessing productivity in derivational affixation. Adopting a smoothing technique, deleted estimation, from the Language Technology literature, we show that new words can be detected when segments of a corpus are cross-compared to find which word types are shared (or unshared). When each corpus segment is created so as to reflect a set of words used by a group of randomly sampled speakers, through a randomization respecting document boundaries, the cross-comparison of corpus segments can be interpreted as revealing the usage distribution of words across groups of speakers. A word shared by fewer corpus segments is more limited in its usage commonality and thus a more likely candidate for a new word. Morphological productivity, the potential of a word formation process involving an affix to form a new word, is assessed for 12 English derivational suffixes (nominal -ness, -ity, -er, -ee, -ion, -ment, and -th; verbal -ize and -ify; adjectival -ish and - ous; adverbial -ly), based on new words identified in the BNC via deleted estimation. Quantifying the usage distribution of new word types across corpus segments opens many possibilities for assessing the productivity of affixes. Cross-comparing as few as two corpus segments offers a crude yet computationally simple method of separating new words (unshared) from non-new words (shared), to yield a productivity index for a given affix. Cross-comparing as many as six corpus segments supports a graded definition of a word's newness (words shared by fewer corpus segments being more likely new) and thereby a more detailed characterization of the productivity of affixes. The proposed methods of identifying new words and assessing productivity are shown to offer valuable insights into the issue of productivity in word formation.
Keywords/Search Tags:New words, Corpus, Productivity
Related items