Font Size: a A A

An Analysis Of Semantic Merger Of English Modal Verb Can By Fuzzy C-Means Clustering

Posted on:2013-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:W L WangFull Text:PDF
GTID:2215330362463061Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
Semantic merger is a common phenomenon in our daily life, especially in ourlanguage. This phenomenon has brought a great trouble to the daily communication andhas become a tough problem in natural language processing (NLP). In order to solve thisproblem, we should determinate all kinds of senses of the word. Word sensesdetermination (WSD) is the key link in the field of information retrieval, machinetranslation, text categorization and speech recognition. The researches on WSD have maderemarkable progress with method of mathematics on information theory, artificialintelligence and some other fields of natural science and technology. The techniques ofword sense determination have greatly advanced, but the research objects have mainlyfocused on the common nouns, verbs and adjectives and are limited on the modal verbs.Modality expresses a speaker's opinion or attitude toward the proposition, which isrealized mainly by modal verbs. Therefore, it is important for us to understand accuratemeanings of modal verbs and grasp the speaker's opinion or attitude.This thesis aims at building a highly accurate model for the determination of Englishmodal verbs by means of Fuzzy C-Means Clustering. Six linguistic features are extracted,which include two semantic features and four syntactic features by tagging, counting andanalyzing the corpus with one million hundred thousand words. And then one of thesyntactic features with two semantic features is regarded as the three data. A model whichcan determine the senses of modal verb can is built by means of Fuzzy C-MeansClustering. The experimental results show that the sense determination accuracy by FuzzyC-Means Clustering reaches95%. The results of clustering show the distribution of thetwo meaning of can. The research finds out that there is a tendency that the majorityinstances with affirmation have the meaning of "ability" when the animate subjects or thesubjects have the inherent properties to do something. There seems to be a tendency thatthe majority examples with negation have the meaning of "possibility" when the mainverb is a verb of stative, the main verb is a verb of action and there is reference to thefuture or the subjects are inanimate. This thesis builds three new models by using the other three syntactic featuresrespectively and two semantic features, and then compares the clustering results of threenew models with clustering results of the old one. The results show that semantic featuresmake more contribution to the word sense determination of English modal verb can thansyntactic features do. The top three linguistic features which influence the sensesdetermination of English modal verb can are gained. They are Mutual Informationbetween can and the main verb, Mutual Information between can and the subject andnegation.Successfully establishing the clustering model not only determinates the senses ofEnglish modal verb can under the environment of senses merger, devotes a lot to therealization of automatic sense-tagging of corpus and decreases researchers' heavyworkload but also improves the quality of machine-translation. The experimental results ofthis thesis provide the advantageous basis for the research of the semantics of modal verbsand the word sense disambiguation. And they also provide the beneficial evidence toextract the features of modal verbs in natural language processing.
Keywords/Search Tags:odal verb can, Fuzzy C-Means Clustering, word sense determination, feature selection
PDF Full Text Request
Related items