Automatic identification of cognates, false friends, and partial cognates

Posted on:2007-10-17

Degree:M.C.S

Type:Thesis

University:University of Ottawa (Canada)

Candidate:Frunza, Oana Magdalena

Full Text:PDF

GTID:2445390005469808

Subject:Computer Science

Abstract/Summary:

Cognates are words in different languages that have similar spelling and meaning. They can help second-language learners with vocabulary expansion and reading comprehension tasks. Special attention needs to be paid to pairs of words that appear similar but are in fact false friends: they have different meanings in all contexts.;In addition to the work done on cognate and false-friend identification we propose a supervised and a semi-supervised method that uses bootstrapping for disambiguating partial cognates between French and English. The proposed methods use only automatically-labeled data and therefore they can be applied to other pairs of languages as well. The data that we use is automatically collected from parallel corpora. The impact of data collected from different domains is also taken into account in our research.;To complement the studies that we did on cognates, false friends and partial cognate pairs of words, we developed an annotation tool for this special type of words. The tool can automatically annotate cognates, false friends and partial cognates for any French text. The tool uses UIMA (Unstructured Information Management Architecture) from IBM and BaLIE (an open-source Java project designed to extract information from free text).;Partial cognates are pairs of words in two languages that have the same meaning in some, but not all, contexts. Detecting the actual meaning of a partial cognate in context can be useful for Machine Translation and Computer-Assisted Language Learning tools. Our research on cognate and false-friend words between two pair of languages (French and English in our case) consists in automatically classifying a pair of words from two languages as cognates or false friends. We use Machine Learning techniques with several measures of orthographic similarity as features for classification. We study the impact of selecting different features, averaging them, and combining them through Machine Learning techniques. The methods work on different pair of languages as long as a small amount of annotated pairs of words is provided as training data.

Keywords/Search Tags:

Cognates, False friends, Words, Languages, Different, Pairs, Data

Related items

1	On Translation Strategies Of "False Friends" In English And Chinese
2	The Study Of Korean "Scholar's Four Friends" False Biography
3	Several Pairs Of Related Words Using The Investigation And Analysis
4	Correct Word Dynasty
5	Mongolian Written Language Vocabulary Study
6	Priming Effect Of Cross Language Cognate Words In Uygur Children With Different Types Of Dyslexia
7	A Study Of Potential English Words Based On Minimal Pairs
8	Contrastive Study On "Fales Friends" Between Chinese And Korean And Chinese Teaching To Korean Students
9	Yen Hui Data Series Test
10	An Investigation Of Iconicity In English Words In Pairs